Method for allocating computing resources and related devices
By acquiring task configuration information and associated resource types, and selecting appropriate computing resources to execute AI computing tasks, the problem of low resource utilization in AI computing services is solved, achieving more efficient resource utilization and cost optimization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240290A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and provides a method and related apparatus for allocating computing resources. Background Technology
[0002] With the continuous development of artificial intelligence (AI) technology, AI computing services are also increasing. For each computing service, a certain amount of computing resources need to be provided for its use.
[0003] In related technologies, computing resources provided by computing devices are typically allocated to computing services at the whole-device granularity or the whole-card granularity. The whole-device granularity refers to providing the computing resources of all computing units in a computing device to the computing service, while the whole-card granularity refers to providing the computing resources of a computing unit to the computing service. For example, a computing unit can be a central processing unit (CPU), a graphics processing unit (GPU), or an application-specific integrated circuit (ASIC).
[0004] However, the resource requests and actual resource usage for most AI computing tasks are inconsistent; resource requests are usually much greater than actual usage, leading to low resource utilization. For example, a computing task might request the computing resources of one GPU, but the actual resource usage only requires a portion of the GPU's resources. Allocating computing resources at the whole-GPU level will result in idle computing resources, failing to meet the actual computing power needs of the task and impacting resource utilization. Summary of the Invention
[0005] This application provides a computing resource allocation method and related apparatus to improve the resource utilization rate of computing resources.
[0006] On one hand, embodiments of this application provide a method for allocating computing resources, including:
[0007] When an execution operation is triggered for a target computing task, the task configuration information of the target computing task is obtained; wherein, the task configuration information includes: target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task;
[0008] Based on the preset association between task information and resource types, at least one candidate resource type associated with the task-related information is selected from each reference resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by partitioning the physical computing resource using virtualization technology;
[0009] Obtain the schedulable amount of computing resources corresponding to each of the at least one candidate resource type;
[0010] If among the at least one candidate resource type, there exists a target resource type whose schedulable quantity satisfies the target resource quantity, then the target computing task is executed using the computing resources corresponding to the target resource type.
[0011] On one hand, embodiments of this application provide a computing resource allocation apparatus, including:
[0012] An information acquisition unit is used to acquire task configuration information of a target computing task when an execution operation is triggered for the target computing task; wherein, the task configuration information includes: target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task;
[0013] The resource selection unit is used to select at least one candidate resource type associated with the task-related information from each reference resource type based on the preset association relationship between task information and resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by partitioning the physical computing resource using virtualization technology;
[0014] A quota acquisition unit is used to acquire the schedulable amount of computing resources corresponding to each of the at least one candidate resource type;
[0015] The task execution unit is configured to execute the target computing task using the computing resources corresponding to the target resource type if there is a target resource type among the at least one candidate resource type whose schedulable quantity satisfies the target resource quantity.
[0016] In one possible implementation, the computing resources of the physical class include at least one of whole machine resources and whole card resources, wherein the whole machine resources represent the computing resources provided by a computing device, and the whole card resources represent the computing resources provided by a portion of the computing units in a computing device.
[0017] In one possible implementation, the computing resources of the virtual class include: virtualized resources, which represent the computing resources provided by a portion of the logical units in a computing unit.
[0018] In one possible implementation, each of the reference resource types is a computing resource of the physical class, a computing resource of the virtual class, or an offline mixed-use resource. The offline mixed-use resource represents an idle resource among the allocated resources used for other computing services. The allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class.
[0019] In one possible implementation, when obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource type, the quota acquisition unit is specifically used for:
[0020] From a resource pool containing various types of computing resources, at least one set of computing resources is divided out, wherein the at least one set of computing resources includes the set of computing resources corresponding to the at least one candidate resource type;
[0021] Based on the total amount of resources and resource usage of each of the at least one set of computing resources, the schedulable amount of computing resources corresponding to each of the at least one candidate resource types is obtained.
[0022] In one possible implementation, when obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource types based on the total resource amount and resource usage of each of the at least one set of computing resources, the quota acquisition unit is specifically used for:
[0023] Obtain the resource scheduling view corresponding to each of the at least one set of computing resources, and each resource scheduling view contains the total amount of resources and the resource usage of the corresponding set of computing resources;
[0024] Based on the total amount of resources and resource usage in the resource scheduling view, the schedulable amount of computing resources corresponding to each of the at least one candidate resource type is obtained.
[0025] In one possible implementation, before executing the target computing task using the computing resources corresponding to the target resource type, the task execution unit is further configured to:
[0026] Update the resource status of the computing resources corresponding to the target resource type from idle to allocated;
[0027] After executing the target computing task using the computing resources corresponding to the target resource type, the task execution unit is further configured to:
[0028] Once the target computation task is completed, the resource status of the processing resource corresponding to the target resource type will be updated to idle.
[0029] In one possible implementation, when the target computing task is executed using the computing resources corresponding to the target resource type, the task execution unit is specifically used for:
[0030] Using the computing resources corresponding to the target resource type, a target computing power container is constructed, which is used to provide a runtime environment for executing the target computing task;
[0031] The target computing task is executed within the target computing container.
[0032] In one possible implementation, the task configuration information further includes: the task priority of the target computation task;
[0033] When constructing the target computing power container using the computing resources corresponding to the target resource type, the task execution unit is specifically used for:
[0034] If the target resource type is an offline mixed-distribution resource, then the resource allocation order of the target computing task is determined according to the task priority; wherein, the offline mixed-distribution resource represents the idle resources in the allocated resources used for other computing services, and the allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class;
[0035] According to the resource allocation order, an offline mixed-distribution container is constructed as the target computing power container using the computing resources corresponding to the mixed-distribution resources.
[0036] In one possible implementation, the association between the task information and the resource type includes: a mapping relationship between resource requirement points and resource types;
[0037] When selecting at least one candidate resource type associated with the task-related information from among the reference resource types based on the preset association between task information and resource types, the resource selection unit is specifically used for:
[0038] Based on the keywords contained in the task-related information, extract at least one resource requirement point from the task-related information;
[0039] Based on the mapping relationship between the resource requirement points and resource types, candidate resource types corresponding to each of the at least one resource requirement point are selected from each reference resource type as at least one candidate resource type associated with the task-related information.
[0040] On one hand, an electronic device is provided, including a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the above-described method.
[0041] On one hand, a computer-readable storage medium is provided, comprising a computer program that, when run on an electronic device, causes the electronic device to perform the steps of any of the methods described above.
[0042] On one hand, a computer program product is provided, the program product including a computer program stored in a computer-readable storage medium, wherein a processor of an electronic device reads from the computer-readable storage medium and executes the computer program, causing the electronic device to perform the steps of any of the methods described above.
[0043] In this embodiment of the application, when allocating resources to the target computing task, the task configuration information of the target computing task is first obtained, which includes the target resource quantity and task-related information.
[0044] Secondly, based on the pre-defined correlation between task information and resource types, candidate resource types associated with task-related information are selected from various reference resource types. Each reference resource type can be either a physical computing resource or a virtual computing resource. By supporting diverse resource types, the task requirements of different computing tasks can be met, which helps improve the resource utilization rate and service efficiency of computing resources. Furthermore, by utilizing task-related information, appropriate computing resources can be used for different computing tasks. This helps to meet the scenario-based customization needs of AI computing, providing different features and customized options. This allows computing tasks to select suitable target resource types based on user needs and preferences, thereby improving user satisfaction with AI computing services.
[0045] Finally, by utilizing the schedulable resources corresponding to the target resource type that meets the target resource quantity, the target computing task is executed. By selecting the target resource type that meets the target resource quantity to execute the target computing task, the normal execution of the target computing task can be guaranteed.
[0046] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0047] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0048] Figure 1This is a schematic diagram illustrating an application scenario provided in the embodiments of this application;
[0049] Figure 2 This is a flowchart illustrating a computing resource allocation method provided in an embodiment of this application;
[0050] Figure 3 This is a schematic diagram illustrating a resource allocation method provided in an embodiment of this application.
[0051] Figure 4 This is a schematic diagram illustrating a requirement point and its corresponding candidate resource type provided in an embodiment of this application.
[0052] Figure 5 This is a schematic diagram of a computing resource set provided in an embodiment of this application;
[0053] Figure 6 This is a schematic diagram of a mixed resource set provided in an embodiment of this application;
[0054] Figure 7 This is a schematic diagram of a resource scheduling graph provided in an embodiment of this application;
[0055] Figure 8 This is a schematic diagram of the structure of a computing power service system provided in the embodiments of this application;
[0056] Figure 9 This is a logical diagram illustrating a computing resource allocation process provided in an embodiment of this application;
[0057] Figure 10 This is a flowchart illustrating a computing resource allocation process provided in an embodiment of this application;
[0058] Figure 11 This is a schematic diagram of the structure of a computing resource allocation device provided in the embodiments of this application;
[0059] Figure 12 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0060] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this application. Obviously, the described embodiments are only some embodiments of the technical solutions of this application, and not all embodiments. Based on the embodiments recorded in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the technical solutions of this application.
[0061] With the continuous development of AI technology, AI computing services are also increasing. For each computing service, a certain amount of computing resources need to be provided for its use.
[0062] In related technologies, computing resources provided by computing devices are typically allocated to computing services at the whole-machine granularity or the whole-card granularity. The whole-machine granularity refers to providing all the computing resources of computing units such as CPU, GPU, and ASIC in a computing device to computing services, while the whole-card granularity refers to providing the computing resources of a single CPU, GPU, or ASIC to computing services.
[0063] However, the resource requests and actual resource usage for most AI computing tasks are inconsistent; resource requests are usually much greater than actual usage, leading to low resource utilization. For example, a computing task might request the computing resources of one GPU, but the actual resource usage only requires a portion of the GPU's resources. Allocating computing resources at the whole-GPU level will result in idle computing resources, failing to meet the actual computing power needs of the task and impacting resource utilization.
[0064] Based on this, in the embodiments of this application, when allocating resources to the target computing task, the task configuration information of the target computing task is first obtained, which includes the target resource quantity and task-related information.
[0065] Secondly, based on the preset correlation between task information and resource types, candidate resource types associated with task-related information are selected from each reference resource type. Each reference resource type can be either a physical computing resource or a virtual computing resource. By supporting diverse resource types, the task requirements of different computing tasks can be met, which helps to improve the resource utilization rate and service efficiency of computing resources. Furthermore, the diversified resource types meet the diverse demands emerging in AI scenarios, which is conducive to supporting the rapid development of AI scenarios and thus improving the efficiency of AI research and development.
[0066] Furthermore, by utilizing task-related information, appropriate computing resources can be used for different computing tasks. This helps to meet the scenario-based customization needs of AI computing, providing different features and customized options. As a result, computing tasks can select the appropriate target resource type according to the user's needs and preferences, thereby improving the user satisfaction of AI computing services.
[0067] Finally, by utilizing the schedulable resources corresponding to the target resource type that meets the target resource quantity, the target computing task is executed. By selecting the target resource type that meets the target resource quantity to execute the target computing task, the normal execution of the target computing task can be guaranteed.
[0068] Furthermore, by using virtual computing resources, the available computing resources can be segmented more finely, allowing the computing power provided by computing devices to be used more fully, improving the utilization rate and service efficiency of computing resources, and also reducing computing costs.
[0069] The following is a brief introduction to the application scenarios to which the technical solutions of the embodiments of this application are applicable. It should be noted that the application scenarios described below are only for illustrating the embodiments of this application and are not intended to limit the scope. In specific implementation, the technical solutions provided by the embodiments of this application can be flexibly applied according to actual needs.
[0070] See Figure 1 The diagram shown is an application scenario provided by an embodiment of this application. In this scenario, a terminal device 101 and a server 102 may be included.
[0071] Terminal device 101 can be, for example, a mobile phone, tablet computer (PAD), laptop computer, desktop computer, smart home appliance (such as smart TV), smart in-vehicle device, smart wearable device, and aircraft. Server 102 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, i.e., content delivery network (CDN), and big data and artificial intelligence platforms, but is not limited to these.
[0072] The computing resource allocation method in this embodiment can be executed by either terminal device 101 or server 102 alone, or by both terminal device 101 and server 102. For example, when terminal device 101 triggers an execution operation for a target computing task, it obtains the task configuration information of the target computing task. The task configuration information includes the target resource quantity and task-related information, which represents the execution requirements during the execution of the target computing task. Based on the preset association between task information and resource types, at least one candidate resource type associated with the task-related information is selected from each reference resource type. Each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by partitioning the physical computing resource using virtualization technology. The schedulable quantity of computing resources corresponding to each of the at least one candidate resource type is obtained. If there is a target resource type among the at least one candidate resource type whose schedulable quantity satisfies the target resource quantity, the target computing task is executed using the computing resources corresponding to the target resource type. Alternatively, the above process can be executed by server 102. For example, when terminal device 101 triggers an execution operation for a target computing task, it obtains the task configuration information of the target computing task. Based on the preset association between task information and resource types, it selects at least one candidate resource type associated with the task information from various reference resource types and sends the candidate resource types to server 102. Then, server 102 obtains the schedulable amount of computing resources corresponding to each of the at least one candidate resource type. If there is a target resource type among the at least one candidate resource type whose schedulable amount meets the target resource amount, then the computing resources corresponding to the target resource type are used to execute the target computing task. In practical applications, specific configurations can be made according to the situation, and this application does not impose specific limitations here.
[0073] Both server 102 and terminal device 101 may include one or more processors, memory, and I / O interfaces for interaction. Furthermore, server 102 may be configured with a database for storing trained model parameters. The memory of both server 102 and terminal device 101 may also store program instructions required for execution in the computing resource allocation method provided in this embodiment. These program instructions, when executed by the processor, can be used to implement the computing resource allocation or model training process provided in this embodiment.
[0074] In some implementations, the user can trigger the allocation of computing resources for the target network traffic through the terminal device 101. The server 102 uses the computing resource allocation method of the present application embodiment to allocate computing resources for the target network traffic, generate traffic analysis results, and return them to the terminal device 101 for presentation.
[0075] In this embodiment, the terminal device 101 and the server 102 can communicate directly or indirectly through one or more networks. This network can be a wired network or a wireless network; for example, the wireless network can be a mobile cellular network or a Wireless-Fidelity (WIFI) network, and of course, it can be other possible networks. This embodiment does not limit this. It should be noted that... Figure 1 The examples shown are merely illustrative; in reality, the number of terminal devices and servers is unlimited and is not specifically limited in the embodiments of this application.
[0076] The following describes the computing resource allocation method provided by the exemplary embodiments of this application in conjunction with the application scenarios described above and with reference to the accompanying drawings. It should be noted that the above application scenarios are only shown to facilitate understanding of the spirit and principles of this application, and the embodiments of this application are not limited in any way in this respect.
[0077] See Figure 2 The diagram shown is a flowchart of a computing resource allocation method provided in an embodiment of this application. This process is applied to an electronic device, which can be a terminal device or a server. The specific process is as follows:
[0078] S201. When an execution operation is triggered for the target computing task, the task configuration information of the target computing task is obtained; wherein, the task configuration information includes: the target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task.
[0079] The target computation task refers to the computational task that needs to be performed. The target computational task can be the computational task of an AI model, including but not limited to: visual models, speech models, game models, medical models, natural language processing (NLP) models, etc. The computational task of an AI model usually refers to its training task. For example, the computational task of a speech model refers to its training task, which can also be called a speech training task. Speech training requires a large amount of audio data and corresponding text annotations to train the model; therefore, the speech training task is to train the model using audio data and corresponding text annotations.
[0080] In some implementations, the computational task of the AI model includes multiple subtasks, and the target computational task can be any one of these subtasks. For example, a speech training task includes a speech recognition subtask and a speech synthesis subtask, where the speech recognition subtask is used to convert speech into text, and the speech synthesis subtask is used to convert text into speech. For each subtask, the computational resource allocation method provided in the embodiments of this application can be used to implement the execution of the subtask, and the target computational task can be either the speech recognition subtask or the speech synthesis subtask.
[0081] In one possible implementation, the execution operation for the target computing task can be triggered when the user triggers a resource request operation for the target computing task, or when the task configuration information for the target computing task is configured. In addition, the task configuration information may include a set task execution time. When the task execution time is reached, the execution operation for the target computing task can be triggered.
[0082] Task configuration information is used for scheduling computing resources. This information can be pre-set or entered when an operation is triggered; there are no restrictions on this. Task-related information can be set according to actual application requirements. For a type of computing task, the configured task information can be the same or different. For example, for voice services, the task information can indicate high computing stability or support for elastic scheduling.
[0083] The target resource quantity represents the resource request quantity for the target computing task. In one possible implementation, the resource request unit for the target resource quantity can be the entire machine resource or the entire card resource. A card refers to an accelerator card, which is equivalent to a computing unit. When the resource request unit for the target resource quantity is the entire machine resource, it means requesting that the computing power of the entire heterogeneous computing device be completely allocated to the target computing resource. For example, the target resource quantity could be the computing resources of all GPUs in a heterogeneous computing device. When the resource request unit for the target resource quantity is the entire card resource, it means requesting that the computing power of the entire accelerator card be completely allocated to the target computing task. For example, the target resource quantity could be the entire computing resources of a single GPU. The resource request unit for the target resource quantity can also be 0.x card resources. When the resource request unit for the target resource quantity is 0.x card resources, it means requesting that a portion of the computing power of a computing unit be allocated to the target computing resource.
[0084] Task-related information characterizes the execution requirements during the execution of the target computation task. Execution requirements can be described using, but are not limited to, at least one evaluation dimension. Evaluation dimensions include, but are not limited to, one or more of the following: computational stability, computational performance, computational cost, tolerance for elastic scheduling, and whether fine-grained resource types are used. Computational stability, also known as computational power stability, refers to the ability of a computing system to maintain consistent performance output and avoid performance fluctuations and failures under long-term operation or high load conditions. For target computing tasks requiring high computational stability, the computing power needs to remain normal during the execution of the target computing task, allowing for continuous execution. Computational efficiency, also known as computational performance, refers to the ability of a target computing task to converge to a good solution as quickly as possible under limited time and resource conditions. For target computing tasks requiring high computational efficiency, the execution process of the target computing task needs to be uninterrupted to ensure continuous computation. Computational cost represents the expenses incurred when using computing resources, including but not limited to one or more aspects such as hardware procurement, power consumption, software licensing, cloud service fees, and labor costs. Lower computational costs usually indicate higher computational efficiency. Elastic scheduling means that the execution of a target computing task can be interrupted and rescheduled. Fine-grained resource types refer to the use of fine-grained resource types such as virtualized resources or hybrid resources for task execution.
[0085] In one possible implementation, the resource request unit for the target resource quantity can be the entire machine resource or the entire card resource, while the task-related information indicates that the target computing task supports fine-grained resource types, such as 0.x card resources. In the subsequent resource allocation process, virtual computing resources can also be allocated to the target computing task.
[0086] In one possible implementation, the resource request unit for the target resource quantity can be the entire machine resource, the entire card resource, or the 0.x card resource. The task-related information indicates that the target computing task supports mixed resources. Therefore, in the subsequent resource allocation process, mixed resources can also be allocated to the target computing task for use.
[0087] In this embodiment of the application, computing resources can be allocated using S201-S204 for each computing task. The computing resource allocation process is the same for each computing task and will not be described in detail here.
[0088] For example, see Figure 3As shown, the computational tasks include: speech training tasks, vision training tasks, game training tasks, medical training tasks, NLP training tasks, etc. Among them, the speech training task is used to train the speech model, the vision training task is used to train the vision model, the game training task is used to train the game model, the medical training task is used to train the vision model, and the NLP training task is used to train the NLP model. Taking voice training, vision training, and game training tasks as examples, dotted patterns represent requested or allocated computing resources, while diagonal lines represent mixed-use resources for other computing tasks. Assuming a single heterogeneous device contains 8 CPUs, for the voice training task, the target resource amount is the computing power of 8 CPUs. Task-related information indicates that the voice training task supports mixed-use resource mining, and the actual resource allocation is 8 CPUs. During the execution of the voice training task, the idle resources of the 8 CPUs can be used as mixed-use resources for other computing resources. For the vision training task, the target resource amount is the computing power of 3 CPUs. Task-related information indicates that the vision training task has high stability requirements and does not support mixed-use resource mining, and the actual resource allocation is 3 CPUs. During the execution of the vision training task, the idle resources of the 3 CPUs are not used by other computing resources; that is, the vision training task exclusively uses the computing power of 3 CPUs. For the game training task, the target resource amount is the computing power of 2.5 CPUs. Task-related information indicates that the game training task supports fine-grained resource types; therefore, the actual resource allocation could be the full computing power of 2 CPUs plus 0.5 CPU cards.
[0089] 202. Based on the preset association between task information and resource type, select at least one candidate resource type associated with task-related information from each reference resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by dividing the physical computing resource using virtualization technology.
[0090] In this embodiment of the application, computing resources refer to the various hardware and software resources in a computing device used to perform computing tasks.
[0091] In one possible implementation, physical computing resources include at least one of whole-machine resources and whole-card resources. Whole-machine resources represent the computing resources provided by a single computing device, while whole-card resources represent the computing resources provided by a portion of the computing units within a single computing device. In this embodiment, the computing device can also be referred to as a single-machine heterogeneous device. A single-machine heterogeneous device represents a device that supports multiple resource types of computing resources, including but not limited to whole-machine resource types, whole-card resource types, virtualization resources, and hybrid deployment resources.
[0092] In one possible implementation, the virtual computing resources include virtualized resources, which represent the computing resources provided by a portion of the logical units within a computing unit. The computing resources provided by the logical units can also be called virtual resources or logical resources. By using virtualization technology to virtualize the computing unit, the physical resources within the computing unit can be divided into multiple virtual resources. In this embodiment, the virtualized resources can also be called 0.x card resources. Through virtualized resources, the entire card's resources can be provided to multiple computing tasks, thereby enhancing the flexibility of resource allocation and improving resource utilization.
[0093] In one possible implementation, physical computing resources also include offline mixed-use resources. Offline mixed-use resources represent idle resources among the allocated resources used for other computing services. These allocated resources include physical computing resources and / or virtual computing resources. Since the resource requests from computing services are typically greater than the actual resource usage, and in some cases, even far exceed the actual usage, in this embodiment, unused computing resources allocated to one computing task are allocated to other computing services (such as the target computing task), fully utilizing idle computing resources and thus improving resource utilization.
[0094] Refer to Table 1, which lists various resource types and their descriptions provided in the embodiments of this application. The resource types include: system resources, card resources, virtualization resources, and offline / hybrid resources. The resource application unit for system resources is a single-machine heterogeneous device. A single-machine heterogeneous device refers to a computing device that integrates multiple different types of computing resources (such as CPU, GPU, ASIC, etc.). For example, if the computing device includes 8 GPUs, then the system resources include the computing resources provided by the 8 GPUs. The resource application unit for card resources is the computing unit within a single-machine heterogeneous device. Taking a computing device including 8 GPUs as an example, the card resources include the computing resources provided by GPU1 to GPU7. The resource application unit for virtualization resources is the logical unit within the computing unit. For example, GPU8 is divided into 10 logical units, which are referred to as 0.1 card, 0.2 card, 0.3 card, etc. Virtualization resources include the computing resources provided by cards 0.1 to 0.9. Offline mixed-deployment resources are resources mined from mixed deployment. Resources mined from mixed deployment can be computing resources allocated to other computing services but not used from whole machine resources, whole card resources, and virtualization resources. It should be noted that since the use of offline mixed-deployment resources cannot affect the normal execution of other computing services, if the target computing task uses mixed-deployment resources, then the service quality of computing services cannot be guaranteed.
[0095] Table 1 (Resource Types and Their Descriptions)
[0096] Resource types Description information Complete machine resources The resource applicant is a heterogeneous complete equipment manufacturer. Full card resources The resource application unit is the heterogeneous card dimension. Virtualized resources The resource application unit is a heterogeneous 0.x card. Offline mixed resources The resource applicant is a resource from mixed mining.
[0097] One possible implementation involves using resource models to describe and manage computing resources. Specifically, for each type of resource, a corresponding resource model is set up to manage the computing resources of that type. For example, resource models include: a whole-machine resource model for whole-card resource management, a whole-card resource model for whole-card resource management, a virtualization resource model for virtualization resource management, and a mixed-deployment resource model for mixed-deployment resource management.
[0098] A resource model is an abstract framework for describing and managing computing resources. It defines information such as the resource type, attributes, behaviors, and relationships between computing resources to ensure efficient utilization, flexible scheduling, and security of computing resources.
[0099] Resource attributes are used to characterize resource characteristics such as the status, configuration, and performance metrics of computing resources. For example, resource attributes can be key-value pairs describing resource characteristics. Resource attributes include, but are not limited to: resource identifiers, which are used to distinguish different resource instances and can be represented by, but are not limited to, IDs; resource status, which characterizes the current state of the computing resource, such as running, stopped, or faulty; resource specifications, which are the configuration information of the computing resource, such as the number of CPU cores, memory size, and disk capacity; and resource location, which indicates the geographical location or data center where the computing resource is located.
[0100] Resource relationships describe the dependencies and connections between different computing resources. Resource relationships include, but are not limited to: parent-child relationships, where one computing resource depends on the existence of another computing resource (e.g., a virtual machine depends on the computing resources of its host machine); dependency relationships, where the use of one computing resource requires the computation results of other computing resources (e.g., an application depends on a database service); and sharing relationships, where multiple computing resources can share the same resource (e.g., multiple virtual machines can share the same storage volume).
[0101] Resource operations refer to various management and control operations performed on computing resources. Resource operations include, but are not limited to: creation operations for creating new resource instances; reading operations for obtaining detailed information and status of computing resources; update operations for modifying the configuration or attributes of computing resources; deletion operations for destroying resource instances and releasing occupied resources; on / off operations for controlling the start / stop status of computing resources; and adjustment operations for dynamically adjusting the resource scale, such as increasing or decreasing CPU, memory, etc.
[0102] In this embodiment, the preset association between task information and resource types records the association between task-related information and candidate reference resources. If a task information has an association with a resource type, it means that the target computing task can be executed using the computing resources corresponding to the associated resource type. A task information can be associated with one or more resource types. Candidate resource types can also be considered as recommended resource types.
[0103] For example, as shown in Table 2, it illustrates the association between task information and resource types provided in this application embodiment. If the task information indicates that the target computing task has high stability requirements, then the candidate resource type can be a whole machine resource, a whole card resource, or a virtualized resource. In addition, the computing resources corresponding to the candidate resource type can be evaluated based on the business scenario to see if they support mixed-distribution mining. If the task information indicates that the target computing task has high computing performance requirements, then the candidate resource type can be a whole machine resource, a whole card resource, or a virtualized resource. In addition, the computing resources corresponding to the candidate resource type can be evaluated based on the business scenario to see if they support mixed-distribution mining. If the task information indicates that the target computing task accepts elastic scheduling, then the candidate resource type can be a whole machine resource, a whole card resource, or a virtualized resource. Whole machine resources, whole card resources, or virtualized resources need to accept mixed-distribution mining, which means that whole machine resources, whole card resources, or virtualized resources can be used for the execution of other computing services. If the task information indicates that the target computing task has low computing cost requirements, then the candidate resource type can be a virtualized resource or an offline mixed-distribution resource.
[0104] Table 2 (Correspondence between execution requirements and resource types)
[0105]
[0106]
[0107] In one possible implementation, at least one requirement point can be extracted from task-related information. A pre-defined association between task information and resource types records the association between requirement points and candidate resource types. The candidate resource type corresponding to each of the at least one requirement point extracted from the task-related information is used as the candidate resource type associated with the task-related information.
[0108] Specifically, based on the keywords contained in the task-related information, at least one resource requirement point is extracted from the task-related information; based on the mapping relationship between resource requirement points and resource types, at least one candidate resource type corresponding to each resource requirement point is selected from each reference resource type, and this candidate resource type is used as at least one candidate resource type associated with the task-related information. Resource requirement points can also be simply referred to as requirement points.
[0109] Each requirement point is used to characterize a type of execution requirement. See Table 2, which contains the requirement points and their descriptions provided in the embodiments of this application. The types of requirement points include, but are not limited to: high computing power stability, high computing efficiency, low computing cost, support for elastic scheduling, and fine-grained resource types.
[0110] Table 3 (Requirements and their descriptions)
[0111] Demand points Description information High computing power stability The computational power requirement remained normal during training, and computations could be performed continuously. High training efficiency The training process must be undisturbed and requires continuous computation. Low cost (high utilization rate) The demand is for low-cost computing power, with relatively weak requirements for stability and efficiency. Supports elastic scheduling The calculation process can be interrupted; a re-adjustment will be performed. Fine-grained resource types If the model or data volume is small and cannot fully utilize the machine, fine-grained computing power is required.
[0112] In the process of keyword extraction, AI models can be used to perform semantic analysis on task-related information to determine resource demand points. Alternatively, the word segments contained in the task-related information can be matched with the demand points of each evaluation dimension. If a demand point of a certain evaluation dimension is matched, it will be used as the extracted resource demand point.
[0113] For example, see Figure 4 As shown, assuming the target computing task is a voice service, the task-related information in the task configuration information indicates that the target computing task requires high computing power stability. Therefore, based on the preset relationship between task information and resource type, the following four reference resource types are selected as candidate resource types: whole machine resources, whole card resources, virtualization resources, and offline mixed resources.
[0114] S203. Obtain the schedulable amount of computing resources corresponding to at least one candidate resource type.
[0115] Scheduled resources refer to the amount of computing resources available for the target computing task.
[0116] In one possible implementation, at least one set of computing resources can be divided from a resource pool containing various types of computing resources; then, based on the total amount of resources and resource usage of each of the at least one set of computing resources, the schedulable amount of computing resources corresponding to each of the at least one candidate resource types can be obtained.
[0117] Wherein, at least one set of computing resources includes a set of computing resources corresponding to at least one candidate resource type.
[0118] A resource pool refers to the abstraction of computing resources (such as CPU, memory, GPU, FPGA, etc.) into a logical collection for multiple virtual machines, containers, or applications to use on demand. Through resource pools, computing resources can be centrally managed and scheduled, improving resource utilization, enhancing flexibility and scalability, and simplifying resource management.
[0119] In this embodiment of the application, when allocating resources for a target computing task, computing resources can be taken from the resource pool and allocated to the target computing task for use.
[0120] In one possible implementation, a set of computing resources can contain computing resources corresponding to multiple candidate resource types. For example, see [link to relevant documentation]. Figure 5 As shown, the resource pool is divided into a dedicated resource set, a virtual resource set, and a mixed-deployment resource set. The dedicated resource set is a set of computing resources that do not support mixed-deployment mining. Dedicated resources can be, but are not limited to, any one of whole machine resources or whole card resources, and can also include both whole machine resources and whole card resources. The virtual resource set is a set of computing resources that are virtualized and used to support resource allocation for 0.x cards. The mixed-deployment resource set is a set of computing resources that support mixed-deployment mining.
[0121] See Figure 6 As shown, the mixed-deployment resource set can specifically include a high-priority mixed-deployment resource subset and a low-priority mixed-deployment resource subset. The high-priority mixed-deployment resource subset is a set of high-priority mixed-deployment resources, and the low-priority mixed-deployment resource subset is a set of low-priority mixed-deployment resources. Low-priority mixed-deployment resources are offline mixed-deployment resources, which include idle resources from the whole machine resources, whole card resources, and virtualization resources that have been allocated to other computing tasks. High-priority mixed-deployment resources include whole machine resources, whole card resources, and virtualization resources that have been allocated to other computing tasks. Figure 6 In this diagram, diagonal lines represent low-priority mixed-distribution resources, dotted lines represent computing resources already allocated to other computing tasks (i.e., high-priority mixed-distribution resources), and blank spaces represent unallocated computing resources. The total amount of computing resources allocated to other computing tasks and unallocated computing resources can be called the total amount of mixed-distribution computing. High-priority mixed-distribution resources are mainly used to support online services, i.e., computing tasks that require real-time responses to user requests. These computing tasks are usually latency-sensitive, requiring the system to complete tasks quickly and provide consistent performance. Low-priority mixed-distribution resources are mainly used to support offline computing tasks, i.e., computing tasks that do not require real-time responses. These tasks can usually be executed in the background, allowing for longer processing times and tolerating a certain degree of latency. In some implementations, unallocated computing resources can also be used as low-priority mixed-distribution resources for offline computing tasks.
[0122] In one possible implementation, a set of computing resources may also include computing resources corresponding to a candidate resource type.
[0123] For example, the resource pool contains computing resources of four types: system resources, card resources, virtualization resources, and offline / mixed deployment resources. From the resource pool, four computing resource sets are divided: a dedicated set of system resources, a dedicated set of card resources, a virtualization resource set, and a mixed deployment resource set. Specifically, the dedicated set of system resources consists of computing resources for system devices that do not support mixed deployment mining; the dedicated set of card resources consists of dedicated computing resources for card devices that do not support mixed deployment mining; the virtualization resource set is a set of computing resources for virtualized devices used to support resource allocation for 0.x cards; and the mixed deployment resource set is a set of computing resources that support mixed deployment mining.
[0124] In one possible implementation, the schedulable quantity can be implemented in, but is not limited to, the following ways:
[0125] First, obtain the resource scheduling view corresponding to at least one set of computing resources. Each resource scheduling view contains the total amount of resources and the resource usage of the corresponding set of computing resources.
[0126] Secondly, based on the total amount of resources and resource usage in the resource scheduling view, the schedulable amount of computing resources corresponding to at least one candidate resource type is obtained.
[0127] In other words, for each set of computing resources, a corresponding resource scheduling view can be drawn. Using the resource scheduling view, the schedulable amount of computing resources can be clearly presented, which is conducive to improving the allocation efficiency of computing resources.
[0128] Resource usage is used to characterize the actual amount of computing resources used and the amount of idle resources.
[0129] See Figure 7 As shown, this is a schematic diagram of a resource scheduling view of a mixed resource set provided in an embodiment of this application. The resource scheduling diagram includes high-priority mixed resources, unallocated resources, and low-priority mixed resources. Figure 7 The diagram still uses diagonal lines to represent low-priority mixed resources, dotted patterns to represent computing resources that have been allocated to other computing tasks (i.e., high-priority mixed resources), and blank spaces to represent unallocated computing resources (i.e., unallocated resources). The total amount of high-priority mixed resources and unallocated resources can be called the total amount of mixed resources.
[0130] It should be noted that, in this embodiment of the application, since the low-optimal mixed resources are used for offline operations and offline tasks do not need to be executed simultaneously, the low-optimal mixed resources allocated can exceed the total amount of low-optimal mixed resources during the allocation process.
[0131] In this embodiment, high-priority mixed-use resources and low-priority mixed-use resources use the same physical resources, and their resource views are independent. For high-priority mixed-use resources, their schedulable amount can be statically allocated based on total machine resources, total card resources, or virtualization resources. The available amount for low-priority mixed-use resources is the total amount of total machine resources, total card resources, or virtualization resources, minus the resources already used by online jobs. Taking total machine resources as an example, the schedulable amount for high-priority mixed-use resources is statically allocated based on total machine resources, while the available amount for low-priority mixed-use resources is the total machine resources minus the resources already used by online jobs. By mining these recyclable resources and filling them with low-priority offline computing jobs, these resources are utilized to improve resource utilization. A resource scheduling graph, in graphical form, displays the resource quantities corresponding to high-priority resources, idle resources, and low-priority resources, facilitating subsequent resource allocation and improving resource allocation efficiency.
[0132] In one possible implementation, after determining the target resource type, the resource status of the computing resource corresponding to the target resource type can be updated from idle to allocated. Correspondingly, after the target computing task is completed, the resource status of the processing resource corresponding to the target resource type is updated to idle. Here, the computing resource corresponding to the target resource type refers to the computing resource used to execute the target computing task among the computing resources belonging to the target resource type.
[0133] By monitoring changes in the resource status of computing resources, it becomes easier to manage and allocate these resources. When resource allocation is required, the available amount of resources can be quickly determined using the resource status, thereby improving resource allocation efficiency.
[0134] 204. If, among at least one candidate resource type, there exists a target resource type whose schedulable quantity satisfies the target resource quantity, then the target computation task is executed using the computing resources corresponding to the target resource type.
[0135] In this embodiment of the application, the schedulable quantity satisfies the target resource quantity, meaning the schedulable quantity is greater than or equal to the target resource quantity. For example, if the target resource quantity is the computing resources provided by 8 GPUs, and the schedulable quantity is the computing resources provided by 10 GPUs, then the schedulable quantity satisfies the target resource quantity.
[0136] One possible implementation involves utilizing computing resources corresponding to the target resource type to execute the target computing task, including but not limited to the following methods:
[0137] By utilizing the computing resources corresponding to the target resource type, a target computing power container is constructed. The target computing power container is used to provide a runtime environment for executing the target computing tasks.
[0138] Execute the target computing task within the target computing container.
[0139] The process of building the target computing power container can also be referred to as the scheduling and production process of the computing power container. A computing power container is a virtualization technology used to encapsulate and manage computing resources, aiming to provide an isolated and portable runtime environment for computing tasks.
[0140] The target computing power container can be a dedicated resource container, a high-optimal mixed deployment container, a low-optimal mixed deployment container, a low-optimal mixed deployment container, or a virtualized container. Referring to Table 4, dedicated resource containers are typically used for computing tasks with high stability and performance requirements, such as pre-training large models or inference services with high real-time requirements; high-optimal mixed deployment containers are typically used for computing tasks with low utilization of deployed inference or training services but requiring high-optimal guarantees; low-optimal mixed deployment containers are typically used for computing tasks requiring low cost and where computation can be disrupted; virtualized containers are typically used for computing tasks with small models or data volumes where the entire GPU cannot be fully utilized.
[0141] Table 4 (Computing Power Containers and Their Descriptions)
[0142] Types of computing power containers Recommended use cases Exclusive container Computational tasks requiring high stability and performance High-quality mixed container Computational tasks with low computational utilization but requiring quality of service assurance. Low-quality mixing container Computational tasks requiring low cost and minimal disruption during computation. Virtualized containers The model or data volume is too small to fully utilize the computing tasks of the entire card.
[0143] In one possible implementation, when constructing the target computing power container using the computing resources corresponding to the target resource type, the target computing power container can be constructed based on the target resource type and the recommended usage scenario of the configured computing power container, matching the task-related information of the target computing task. For example, assuming the target resource type is a full-GPU resource and the task-related information representation supports mixed-distribution mining, then a high-optimal mixed-distribution resource can be constructed. Alternatively, if the target resource type is a full-GPU resource and the task-related information representation does not support mixed-distribution mining, then a resource-exclusive resource can be constructed.
[0144] In one possible implementation, the task configuration information also includes: the task priority of the target computation task;
[0145] Then, using the computing resources corresponding to the target resource type, a target computing power container is constructed, including:
[0146] If the target resource type is an offline mixed resource, the resource allocation order of the target computing task is determined according to the task priority.
[0147] Based on the resource allocation order, and utilizing the computing resources corresponding to the offline mixed-distribution resources, a low-optimal mixed-distribution container is constructed as the target computing power container.
[0148] In this context, offline mixed-deployment resources represent idle resources among the allocated resources used for other computing services. These allocated resources include physical computing resources and / or virtual computing resources. Low-optimal mixed-deployment containers can also be referred to as offline mixed-deployment containers.
[0149] Task priority can be represented numerically or by a hierarchy.
[0150] For example, if the target resource type is offline mixed-distribution resource, the target computing task has a high priority, and other unprocessed computing tasks using offline mixed-distribution resources have a low priority, then the resource allocation order of the target computing task can be adjusted so that its resource allocation time is earlier than that of other unprocessed tasks using offline mixed-distribution resources. During the resource allocation process for the target computing task, a low-priority mixed-distribution container is constructed using the computing resources corresponding to the offline mixed-distribution resources as the target computing power container, and this container is used to execute the target computing task.
[0151] For computational tasks that can be processed offline, they can usually be executed in the background, allowing for longer processing times and tolerating a certain degree of latency. Therefore, when there are multiple computational tasks that need to be processed, task priorities can be used to flexibly adjust the execution order of the computational tasks, thereby ensuring that each computational task is completed smoothly while meeting different task processing requirements.
[0152] The following example illustrates this application.
[0153] See Figure 8 As shown, the computing tasks include model training tasks such as speech models, vision models, medical models, NLP models, text models, and game models. The requirements for each computing task are analyzed, including: high computing power stability, high training efficiency, low utilization cost, support for elastic scheduling, and fine-grained resource types. Based on the relationship between each requirement and resource type, target resource types can be selected from four categories: whole machine resources, whole card resources, 0.x card resources, and offline mixed deployment resources. Then, computing resources of the target resource type are used to provide computing power services for the corresponding computing tasks.
[0154] See Figure 9As shown, assuming the computing tasks include speech training, vision training, medical training, NLP training, text training, and game training, the requirements for each computing task are analyzed to obtain the requirements for each task. These requirements include: high computing power stability, high training efficiency, low utilization cost, support for elastic scheduling, and fine-grained resource types. Then, based on the relationship between the requirements and resource types, candidate resource types are selected from whole machine resources, whole card resources, 0.x card resources, and offline mixed deployment resources. Multiple computing resource sets are then divided from the resource pool, and the schedulable amount of computing resources corresponding to each candidate resource type is obtained using the resource scheduling view corresponding to each set. Next, based on the schedulable amount of computing resources corresponding to the candidate resource types, the target resource type is determined, and containerized production is performed using the target resource type to build the target computing power container. The target computing power container can be a resource-only container, a high-optimal mixed deployment container, a low-optimal mixed deployment container, or a virtualized container. Finally, the target computing power container is used to provide computing power services for the computing tasks.
[0155] See Figure 10 As shown, this is a flowchart illustrating a computing resource allocation process provided in an embodiment of this application. The process includes:
[0156] S1001. Extract multiple requirements from task-related information.
[0157] First, the computational task's demand for computing resources is sorted out and organized to summarize and output the points of demand for computing resources.
[0158] S1002. From the various reference resource types, select the candidate resource types corresponding to each requirement point. The reference resource types include: whole card resources, whole machine resources, virtualization resources, and offline / mixed deployment resources.
[0159] S1003. From the resource pool, three types of computing resource sets are divided. The three types of computing resource sets include dedicated resource sets, virtualized resource sets, and mixed-distribution resource sets.
[0160] S1004. Based on the resource scheduling view of the three types of computing resource sets, obtain the schedulable amount of computing resources for the candidate resource types.
[0161] S1005, Read the target resource quantity.
[0162] S1006. Execute the target computing task using computing resources of the target resource type that can be scheduled to meet the target resource quantity.
[0163] For diverse AI computing scenarios, the process begins by aggregating and outputting the demand points for computing resources. Based on these demands, the required resource types for heterogeneous computing services can be identified. Then, based on these resource types, logical processing of the resource pool is performed to determine the schedulable amount of computing resources for candidate resource types. For example, by logically processing the resource pool, available quotas supporting exclusive resource scheduling, 0.x card resources, and high-low-optimal resources supporting mixed deployment scheduling can be constructed. Next, the resource request (i.e., target resource quantity) of the loaded computing task is read and matched with candidate resource types. The matching condition can be whether the available quota meets the target resource quantity. If the available quota meets the business demand, the target computing container is constructed using the matched target resource type for the computing task to execute. If the available quota does not meet the business demand, an error message indicating insufficient resources can be returned to prompt the business side to handle the issue.
[0164] Based on the same inventive concept, embodiments of this application provide a computing resource allocation device. For example... Figure 11 As shown, this is a structural schematic diagram of a computing resource allocation device 1100, which may include:
[0165] The information acquisition unit 1101 is used to acquire the task configuration information of the target computing task when an execution operation is triggered for the target computing task; wherein, the task configuration information includes: target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task;
[0166] The resource selection unit 1102 is used to select at least one candidate resource type associated with the task-related information from each reference resource type according to the preset association relationship between task information and resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by dividing the physical computing resource using virtualization technology;
[0167] The quota acquisition unit 1103 is used to acquire the schedulable amount of computing resources corresponding to each of the at least one candidate resource type;
[0168] The task execution unit 1104 is used to execute the target computing task by utilizing the computing resources corresponding to the target resource type if there is a target resource type among the at least one candidate resource type whose schedulable quantity satisfies the target resource quantity.
[0169] In one possible implementation, the computing resources of the physical class include at least one of whole machine resources and whole card resources, wherein the whole machine resources represent the computing resources provided by a computing device, and the whole card resources represent the computing resources provided by a portion of the computing units in a computing device.
[0170] In one possible implementation, the computing resources of the virtual class include: virtualized resources, which represent the computing resources provided by a portion of the logical units in a computing unit.
[0171] In one possible implementation, each of the reference resource types is a computing resource of the physical class, a computing resource of the virtual class, or an offline mixed-use resource. The offline mixed-use resource represents an idle resource among the allocated resources used for other computing services. The allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class.
[0172] In one possible implementation, when obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource type, the quota acquisition unit 1103 is specifically used for:
[0173] From a resource pool containing various types of computing resources, at least one set of computing resources is divided out, wherein the at least one set of computing resources includes the set of computing resources corresponding to the at least one candidate resource type;
[0174] Based on the total amount of resources and resource usage of each of the at least one set of computing resources, the schedulable amount of computing resources corresponding to each of the at least one candidate resource types is obtained.
[0175] In one possible implementation, when obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource types based on the total resource amount and resource usage of each of the at least one set of computing resources, the quota acquisition unit 1103 is specifically used for:
[0176] Obtain the resource scheduling view corresponding to each of the at least one set of computing resources, and each resource scheduling view contains the total amount of resources and the resource usage of the corresponding set of computing resources;
[0177] Based on the total amount of resources and resource usage in the resource scheduling view, the schedulable amount of computing resources corresponding to each of the at least one candidate resource type is obtained.
[0178] In one possible implementation, before executing the target computing task using the computing resources corresponding to the target resource type, the task execution unit 1104 is further configured to:
[0179] Update the resource status of the computing resources corresponding to the target resource type from idle to allocated;
[0180] After executing the target computing task using the computing resources corresponding to the target resource type, the task execution unit 1104 is further configured to:
[0181] Once the target computation task is completed, the resource status of the processing resource corresponding to the target resource type will be updated to idle.
[0182] In one possible implementation, when the target computing task is executed using the computing resources corresponding to the target resource type, the task execution unit 1104 is specifically used for:
[0183] Using the computing resources corresponding to the target resource type, a target computing power container is constructed, which is used to provide a runtime environment for executing the target computing task;
[0184] The target computing task is executed within the target computing container.
[0185] In one possible implementation, the task configuration information further includes: the task priority of the target computation task;
[0186] When constructing the target computing power container using the computing resources corresponding to the target resource type, the task execution unit 1104 is specifically used for:
[0187] If the target resource type is an offline mixed-distribution resource, then the resource allocation order of the target computing task is determined according to the task priority; wherein, the offline mixed-distribution resource represents the idle resources in the allocated resources used for other computing services, and the allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class;
[0188] According to the resource allocation order, an offline mixed-distribution container is constructed as the target computing power container using the computing resources corresponding to the mixed-distribution resources.
[0189] In one possible implementation, the target computing task is a computing task of an artificial intelligence model, and the association between the task information and the resource type includes: the mapping relationship between resource demand points and resource types;
[0190] When selecting at least one candidate resource type associated with the task-related information from among the reference resource types based on the preset association between task information and resource types, the resource selection unit 1102 is specifically used for:
[0191] Based on the keywords contained in the task-related information, extract at least one resource requirement point from the task-related information;
[0192] Based on the mapping relationship between the resource requirement points and resource types, candidate resource types corresponding to each of the at least one resource requirement point are selected from each reference resource type as at least one candidate resource type associated with the task-related information.
[0193] For ease of description, the above sections are divided into modules (or units) according to their functions and described separately. Of course, in implementing this application, the functions of each module (or unit) can be implemented in one or more software or hardware components.
[0194] Regarding the apparatus in the above embodiments, the specific manner in which each unit executes the request has been described in detail in the embodiments related to the method, and will not be elaborated here.
[0195] Those skilled in the art will understand that various aspects of this application can be implemented as a system, method, or program product. Therefore, various aspects of this application can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, collectively referred to herein as a "circuit," "module," or "system."
[0196] Based on the same inventive concept, embodiments of this application also provide an electronic device. In one embodiment, the electronic device can be a server or a terminal device. See also... Figure 12 As shown, it is a schematic diagram of the structure of a possible electronic device provided in an embodiment of this application. Figure 12 In the electronic device 1200, there are: processor 1210 and memory 1220.
[0197] The memory 1220 stores a computer program that can be executed by the processor 1210. The processor 1210 can execute the steps of the above-mentioned computing resource allocation method by executing the instructions stored in the memory 1220.
[0198] Memory 1220 may be volatile memory, such as random-access memory (RAM); memory 1220 may also be non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state drive (SSD); or memory 1220 may be any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but is not limited thereto. Memory 1220 may also be a combination of the above-described memories.
[0199] The processor 1210 may include one or more central processing units (CPUs) or digital processing units, etc. The processor 1210 implements the above-described computing resource allocation method when executing computer programs stored in the memory 1220.
[0200] In some embodiments, the processor 1210 and the memory 1220 may be implemented on the same chip, while in other embodiments they may be implemented on separate chips.
[0201] This application embodiment does not limit the specific connection medium between the processor 1210 and the memory 1220. This application embodiment takes the connection between the processor 1210 and the memory 1220 via a bus as an example. Figure 12 The diagram uses thick lines to describe the connections between other components; these are merely illustrative and not intended to be limiting. Buses can be categorized as address buses, data buses, control buses, etc. For ease of description, Figure 12 It is described using only a thick line, but does not indicate that there is only one bus or one type of bus.
[0202] Based on the same inventive concept, embodiments of this application provide a computer-readable storage medium including a computer program. When the computer program is run on an electronic device, it causes the electronic device to perform the steps of the aforementioned computing resource allocation method. In some possible implementations, various aspects of the computing resource allocation method provided in this application can also be implemented as a program product including a computer program. When the program product is run on an electronic device, the computer program causes the electronic device to perform the steps in the aforementioned computing resource allocation method. For example, the electronic device can perform actions such as... Figure 2 The steps are shown in the figure.
[0203] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, RAM, ROM, erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0204] The program product of the embodiments of this application may be a CD-ROM and include a computer program, and may run on an electronic device. However, the program product of this application is not limited thereto. In this document, the readable storage medium may be any tangible medium that contains or stores a computer program that may be used by or in conjunction with a command execution system, apparatus, or device.
[0205] A readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a readable computer program. This propagated data signal may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting a computer program for use by or in conjunction with a command execution system, apparatus, or device.
[0206] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0207] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for allocating computing resources, characterized in that, include: When an execution operation is triggered for a target computing task, the task configuration information of the target computing task is obtained; wherein, the task configuration information includes: target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task; Based on the preset association between task information and resource types, at least one candidate resource type associated with the task-related information is selected from each reference resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by partitioning the physical computing resource using virtualization technology; Obtain the schedulable amount of computing resources corresponding to each of the at least one candidate resource type; If among the at least one candidate resource type, there exists a target resource type whose schedulable quantity satisfies the target resource quantity, then the target computing task is executed using the computing resources corresponding to the target resource type.
2. The method as described in claim 1, characterized in that, The physical computing resources include at least one of whole machine resources and whole card resources, wherein the whole machine resources represent the computing resources provided by a computing device, and the whole card resources represent the computing resources provided by a portion of the computing units in a computing device.
3. The method as described in claim 1, characterized in that, The virtual computing resources include: virtualized resources, which represent the computing resources provided by a portion of the logical units within a computing unit.
4. The method as described in claim 1, characterized in that, Each of the reference resource types is a computing resource of the physical class, a computing resource of the virtual class, or an offline mixed-use resource, wherein the offline mixed-use resource represents an idle resource among the allocated resources used for other computing services, and the allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class.
5. The method according to any one of claims 1-4, characterized in that, The step of obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource type includes: From a resource pool containing various types of computing resources, at least one set of computing resources is divided out, wherein the at least one set of computing resources includes the set of computing resources corresponding to the at least one candidate resource type; Based on the total amount of resources and resource usage of each of the at least one set of computing resources, the schedulable amount of computing resources corresponding to each of the at least one candidate resource type is obtained.
6. The method as described in claim 5, characterized in that, The step of obtaining the schedulable amount of computing resources corresponding to each of the at least one candidate resource types based on the total resource amount and resource usage of each of the at least one set of computing resources includes: Obtain the resource scheduling view corresponding to each of the at least one set of computing resources, and each resource scheduling view contains the total amount of resources and the resource usage of the corresponding set of computing resources; Based on the total amount of resources and resource usage in the resource scheduling view, the schedulable amount of computing resources corresponding to each of the at least one candidate resource type is obtained.
7. The method as described in claim 5, characterized in that, Before executing the target computing task using the computing resources corresponding to the target resource type, the process further includes: Update the resource status of the computing resources corresponding to the target resource type from idle to allocated; After executing the target computing task using the computing resources corresponding to the target resource type, the process further includes: Once the target computation task is completed, the resource status of the processing resource corresponding to the target resource type will be updated to idle.
8. The method according to any one of claims 1-4, characterized in that, The step of using the computing resources corresponding to the target resource type to execute the target computing task includes: Using the computing resources corresponding to the target resource type, a target computing power container is constructed, which is used to provide a runtime environment for executing the target computing task; The target computing task is executed within the target computing container.
9. The method as described in claim 8, characterized in that, The task configuration information also includes: the task priority of the target computation task; The step of constructing a target computing power container using the computing resources corresponding to the target resource type includes: If the target resource type is an offline mixed-distribution resource, then the resource allocation order of the target computing task is determined according to the task priority; wherein, the offline mixed-distribution resource represents the idle resources in the allocated resources used for other computing services, and the allocated resources include the computing resources of the physical class and / or the computing resources of the virtual class; According to the resource allocation order, an offline mixed-distribution container is constructed as the target computing power container using the computing resources corresponding to the mixed-distribution resources.
10. The method according to any one of claims 1-4, characterized in that, The association between task information and resource types includes: the mapping relationship between resource demand points and resource types; Based on the preset association between task information and resource types, at least one candidate resource type associated with the task-related information is selected from each reference resource type, including: Based on the keywords contained in the task-related information, extract at least one resource requirement point from the task-related information; Based on the mapping relationship between the resource requirement points and resource types, candidate resource types corresponding to each of the at least one resource requirement point are selected from each reference resource type as at least one candidate resource type associated with the task-related information.
11. A computing resource allocation device, characterized in that, include: An information acquisition unit is used to acquire task configuration information of a target computing task when an execution operation is triggered for the target computing task; wherein, the task configuration information includes: target resource quantity and task-related information, and the task-related information represents: the execution requirements during the execution of the target computing task; The resource selection unit is used to select at least one candidate resource type associated with the task-related information from each reference resource type based on the preset association relationship between task information and resource type; wherein each reference resource type is a physical computing resource or a virtual computing resource, and the virtual computing resource is obtained by partitioning the physical computing resource using virtualization technology; A quota acquisition unit is used to acquire the schedulable amount of computing resources corresponding to each of the at least one candidate resource type; The task execution unit is configured to execute the target computing task using the computing resources corresponding to the target resource type if there is a target resource type among the at least one candidate resource type whose schedulable quantity satisfies the target resource quantity.
12. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods described in claims 1 to 10.
13. A computer-readable storage medium, characterized in that, It includes a computer program that, when run on an electronic device, causes the electronic device to perform the steps of any of the methods described in claims 1 to 10.
14. A computer program product, characterized in that, It includes a computer program stored in a computer-readable storage medium, and a processor of an electronic device reads from and executes the computer program, causing the electronic device to perform the steps of any of the methods described in claims 1 to 10.