A device resource application method and device, electronic device, and storage medium
By constructing task request groups to match the total number of GPU cards in the machine to request GPU device resources, the problem of low device resource utilization is solved, and more efficient resource utilization and task execution efficiency are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-05-25
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, there are many fragmented resources when applying for equipment resources, resulting in low resource utilization.
By determining the individual container demand of task requests in the current task request set, and combining this with the total number of GPUs in the device cluster, task request groups are constructed to match the total number of GPUs, and corresponding GPU device resources are requested.
It reduces the occurrence of fragmented resources and improves the utilization rate of equipment resources, especially improving task execution efficiency when there are sufficient task requests.
Smart Images

Figure CN115391001B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of Internet communication technology, and in particular to a method, apparatus, electronic device and storage medium for applying for equipment resources. Background Technology
[0002] Devices, acting as resource providers, can offer resources for processing related tasks. These resources can be GPU (Graphics Processing Unit) resources. Devices typically provide resources in the form of containers, which are created based on the specifications specified by the task. In related technologies, different tasks have different specifications, and individual devices have limited resources. When requesting device resources to process related tasks, if the device cannot provide its full resources, then unused fragmented resources will exist. As the number of tasks increases, fragmented resources will continue to increase, thus compromising device resource utilization. Therefore, solutions are needed to reduce the occurrence of fragmented resources and effectively ensure device resource utilization. Summary of the Invention
[0003] To address the problems of fragmented resources and inability to guarantee resource utilization in existing technologies when applying for equipment resources, this application provides a method, apparatus, electronic device, and storage medium for applying for equipment resources:
[0004] According to a first aspect of this application, a method for requesting equipment resources is provided, the method comprising:
[0005] Determine the individual container demand for the GPU resource card indicated by the task requests in the current task request set;
[0006] Determine the total number of GPUs in the device cluster;
[0007] Based on the total number of GPU cards and the single container demand indicated by the task request, a task request group is constructed whose sum of the indicated single container demand matches the total number of GPU cards, so as to apply for the corresponding GPU device resources based on the task request group.
[0008] According to a second aspect of this application, a device for requesting equipment resources is provided, the device comprising:
[0009] Task Request Response Module: Used to determine the demand for a single container on the GPU resource card indicated by the task request in the current task request set;
[0010] Total GPU Quantity Determination Module: Used to determine the total number of GPUs in the device cluster;
[0011] Task Request Group Construction Module: Based on the total number of GPU cards and the single container demand indicated by the task request, construct a task request group whose sum of the indicated single container demand matches the total number of GPU cards, so as to apply for corresponding GPU device resources based on the task request group.
[0012] According to a third aspect of this application, an electronic device is provided, the electronic device including a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the device resource request method as described in the first aspect.
[0013] According to a fourth aspect of this application, a computer-readable storage medium is provided, the storage medium storing at least one instruction or at least one program segment, the at least one instruction or the at least one program segment being loaded and executed by a processor to implement the device resource request method as described in the first aspect.
[0014] According to a fifth aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the device resource request method as described in the first aspect.
[0015] The device, apparatus, electronic device, and storage medium provided in this application have the following technical advantages:
[0016] This application determines the individual container demand for GPU resource cards indicated by task requests in the current task request set. Then, based on the total number of GPU cards in the device and the individual container demand indicated by the task requests, it constructs task request groups whose sum of the indicated individual container demands matches the total number of GPU cards in the device. Based on these task request groups, corresponding GPU device resources are requested. This grouping at the task request level ensures that the device requests resources at the full device level as much as possible, thereby reducing fragmented GPU resource cards and effectively guaranteeing device resource utilization. Especially when there are sufficient task requests, it can improve task execution efficiency. Attached Figure Description
[0017] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a schematic diagram of an application environment provided in an embodiment of this application;
[0019] Figure 2 This is a flowchart illustrating a method for requesting equipment resources provided in an embodiment of this application;
[0020] Figure 3 This is a schematic diagram of an embodiment of the present application, which constructs a task request group whose sum of indicated single container requirements matches the total number of card slots based on the total number of card slots and the single container requirements indicated by the task requests.
[0021] Figure 4 This is a schematic diagram illustrating the construction of a task request group based on a first task request and a second task request, provided in an embodiment of this application.
[0022] Figure 5 This is a schematic diagram of the server architecture provided in the embodiments of this application;
[0023] Figure 6 This is a schematic diagram of a timeout determination provided in an embodiment of this application;
[0024] Figure 7 This is a block diagram of a device resource application apparatus provided in an embodiment of this application;
[0025] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0026] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0027] It should be noted that the terms "comprising" and "having" and any variations thereof in the specification, claims and accompanying drawings of this application are intended to cover non-exclusive inclusion. For example, a process, method, system, product or server that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or units that are not explicitly listed or that are inherent to such processes, methods, products or devices.
[0028] Please see Figure 1 , Figure 1This is a schematic diagram of an application environment provided in an embodiment of this application. This application environment may include a client 10 and a server 20. The client 10 and the server 20 can be connected directly or indirectly via wired or wireless communication. The client sends a task request to the server. The server constructs a task request group whose sum of the indicated single container requirements for GPU resource cards and the total number of GPU cards in the current task request set match the total number of GPU cards, based on the task request group. The server then requests the corresponding GPU device resources based on the task request group. It should be noted that... Figure 1 This is just one example.
[0029] Clients can include physical devices such as smartphones, desktop computers, tablets, laptops, augmented reality (AR) / virtual reality (VR) devices, digital assistants, smart speakers, and smart wearable devices. They can also include software running on these physical devices, such as computer programs. The operating systems that support the client can include Android, iOS (a mobile operating system developed by Apple), Linux, and Microsoft Windows.
[0030] The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. The server may include network communication units, processors, and memory, etc. The server can provide backend services to the corresponding clients.
[0031] Cloud computing refers to the delivery and usage model of IT infrastructure, meaning obtaining necessary resources through the network in an on-demand and easily scalable manner. In a broader sense, cloud computing refers to the delivery and usage model of services, meaning obtaining necessary services through the network in an on-demand and easily scalable manner. These services can be IT and software-related, internet-related, or other services. Cloud computing is a product of the development and integration of traditional computer and network technologies such as grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and load balancing.
[0032] The following describes a specific embodiment of a method for applying for equipment resources according to this application. Figure 2 This is a flowchart illustrating a device resource application method provided in this application. This application provides the operational steps described in the embodiments or flowchart, but based on conventional or non-inventive methods, it may include more or fewer operational steps. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or product execution, the methods shown in the embodiments or drawings can be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). Specifically, as shown... Figure 2 As shown, the method may include:
[0033] S201: Determine the single container demand for the GPU resource card indicated by the task request in the current task request set;
[0034] In this embodiment, the server receives task requests, and the unprocessed task requests together form the current task request set. The current task request set can contain at least one task request, which may carry an indication of the single container requirement and total requirement for GPU resource cards. Graphics Processing Units (GPUs) are increasingly popular due to their superior computing power. GPUs are frequently used for computational processing in various scenarios, such as training artificial intelligence (AI) models. GPU devices configured with GPU resource cards provide GPU resources for task processing in the form of containers, where containers are created based on the single container requirement for GPU resource cards indicated in the task request. For example, if the single container requirement for GPU resource cards indicated in the task request is 2, it means that the task processing requires 2 GPU resource cards to create containers. If the total requirement for GPU resource cards indicated in the task request is 6, it means that the task processing requires 3 containers, where each container is configured with 2 GPU resource cards.
[0035] For the current set of task requests, it is possible to determine the individual container demand for the GPU resource card indicated by all task requests, or to determine the individual container demand for the GPU resource card indicated by a subset of task requests. These subsets of task requests can be those triggered by relevant task policies, such as selecting requests in order of priority (e.g., prior to a first preset order) or prioritizing requests in order of priority (e.g., prior to a second preset order). Furthermore, the timing for determining the individual container demand indicated by a particular task request can be either when the task request is received or when the task request is triggered by a relevant task policy.
[0036] See Figure 5 The following will introduce the three types of task requests received:
[0037] 1) Type 1 Task Request (Task Submission via Page): The server receives Type 1 task requests submitted by the target object through an interactive interface. The client provides a user interface (UI) for interaction. The target object (e.g., a regular user or staff member) can submit tasks through the user interface, and the client then sends a task request to the server based on the task. The task submitted by the target object through the user interface can be generated by the target object filling in task information on the relevant page using Object Storage Service (OSS) page tools. The filled-in task information can include the single container requirement for the GPU resource card, the total requirement for the GPU resource card, the path directory of the task output log, the path directory of the task result output, etc. When the task type is an AI model training task, the task information can also include the image information of the model used for training. After filling in the task information, the target object can trigger interactive operation controls to submit the task, such as clicking the relevant button to submit the task.
[0038] 2) Second Type of Task Request (Scheduled Task): The server receives second type of task requests generated periodically based on the target business. Scheduled task requests pointing to the target business are automatically generated and sent to the server at each target time point. The target business can be set in the backend of the server in the device resource application scheme provided in this application, or it can be set by other parties. The target business can refer to a specific internet product, a type of internet product, or a functional module of an internet product. The target time point can be at least one fixed time point within a preset period, such as every hour of the day. The scheduled task corresponding to the scheduled task request can be considered as customized by the target business based on the target time point. The task information carried in the scheduled task request can include the single container requirement for the GPU resource card, the total requirement for the GPU resource card, the path directory of the task output log, the path directory of the task result output, etc. When the task type is an AI model training task, the scheduled task can be the processing of incremental training data within a preset period (e.g., one day), and the task information can also include the image information of the model used in the training task.
[0039] 3) Third Type of Task Request (Interface Submission Task): The server receives third type of task requests submitted via an interface. The server provides an Application Programming Interface (API) for connecting to the user client. The user client can submit tasks through this interface according to its own backend logic, thereby enabling the user client to send task requests to the server based on the task. The user client here can be the client corresponding to the server in the device resource application scheme provided in this application, or it can be a client of other parties. When submitting tasks via API, the protocol fields required in the API can include the single container requirement for the GPU resource card, the total requirement for the GPU resource card, the path directory of the task output log, the path directory of the task result output, etc. When the task type is an AI model training task, it can also include the image information of the model used for training.
[0040] The task request received by the server can be any of the three types mentioned above. These three types of task requests enrich the ways tasks can be generated, meet the needs of more task generation scenarios, and allow the server to choose a more suitable and convenient way to send task requests based on the specific scenario requirements.
[0041] Combination Figure 5 The schematic diagram of the architecture shown allows the user layer in the server to receive task requests, and the task layer in the server to determine the single container requirement and total requirement for the GPU resource card indicated by the task request.
[0042] As one possible implementation, such as Figure 6 As shown, the method also includes a timeout determination process: first, determining the interval between the current time and the task request reception time; then, when the interval is less than a preset time, waiting for a notification indicating successful construction of the task request group; and when the interval is greater than or equal to the preset time, requesting the corresponding GPU device resources based on the task request.
[0043] Compared to the first business logic of constructing a task request group in step S202 to apply for the corresponding GPU device resources based on the task request group, the second business logic of applying for the corresponding GPU device resources based on the task request can, to a certain extent, ensure the timeliness of processing task requests, avoid missing the processing of task requests that failed to be processed in the task group, and thus ensure the response efficiency of task requests.
[0044] For example, if the server receives task request 1 at 12:00 and the current time is 12:01, the interval is 1 minute. If the preset time is 5 minutes, since the interval (1 minute) is less than the preset time (5 minutes), it waits for a notification indicating successful construction of the task request group. If the preset time is 30 seconds, since the interval (1 minute) is greater than the preset time (30 seconds), it requests the corresponding GPU device resources based on task request 1. The preset time can be seen as a timeout configured for the first business logic. The first business logic instructs task grouping and overall machine resource scheduling. If the task grouping corresponding to a certain task request is not completed within the timeout period, it will no longer wait for task grouping for that task request and will request the GPU device resources corresponding to that task request according to the second business logic. If the task grouping corresponding to a certain task request is completed within the timeout period, it will continue to request the GPU device resources corresponding to the relevant task request group according to the first business logic. It should be noted that the preset time can be flexibly initialized as needed.
[0045] For example, the process of requesting corresponding GPU device resources based on a task request may include: first, using a saturation priority algorithm to determine candidate GPU devices from the device cluster; then, based on the total demand for GPU resource cards and the demand for a single container indicated by the task request, determining the corresponding GPU device resources from the candidate GPU devices.
[0046] The server can use a saturation-first algorithm to identify candidate GPU devices from the device cluster. The "saturation-first algorithm" indicates that the selection of GPU devices follows the principle of minimizing fragmentation. For example, if the task request indicates a total GPU resource card requirement of 1, and the device cluster has GPU device 1 with 2 remaining cards and GPU device 2 with 3 remaining cards, then GPU device 1 is selected as a candidate GPU device, thereby improving the efficiency of fully utilizing the GPU resources in GPU device 1. Then, based on the total GPU resource card requirement indicated by the task request and the requirement of a single container, the corresponding GPU device resource is determined from the candidate GPU devices. See [link to relevant documentation] for practical applications. Figure 5 The resource layer includes a resource scheduling layer and GPU device nodes. The resource scheduling layer matches GPU resources and produces GPU containers based on task requests.
[0047] S202: Determine the total number of GPU cards in the device cluster;
[0048] In this embodiment, the total number of GPU cards in a GPU device refers to the number of GPU resource cards configured in the GPU device, such as a GPU device with a total number of 8 cards. Of course, the GPU device cluster may also include GPU devices configured with other total numbers of cards besides 8, such as 6 cards, 16 cards, etc.
[0049] S203: Based on the total number of GPU cards and the single container demand indicated by the task request, construct a task request group whose sum of the indicated single container demand matches the total number of GPU cards, and apply for corresponding GPU device resources based on the task request group.
[0050] In this embodiment, the individual container requirements indicated by the task requests in the current task request set vary, and the total number of GPU cards in a single device is also limited. To reduce GPU resource card fragmentation, the individual container requirements indicated by different task requests can be aggregated based on the total number of GPU cards, thereby achieving task grouping to request total system resources. For example, if task request a indicates a single container requirement of 2 for the GPU resource card, and task request b indicates a single container requirement of 6 for the GPU resource card, and the total number of GPU cards is 8, then a task request group can be constructed based on task request a and task request b to request the total system resources of a GPU device with a total number of GPU cards of 8.
[0051] As one possible implementation, such as Figure 3 As shown, the step of constructing a task request group whose sum of the indicated single container demands matches the total number of memory cards based on the total number of memory cards and the single container demand indicated by the task request includes:
[0052] S301: Select the highest priority task request from the current task request set and remove it from the set; wherein, each task request in the current task request set carries a corresponding priority;
[0053] S302: When the single container demand indicated by the currently selected task request is less than the total number of cards, select the highest priority task request from the current task request set after removing the selected task request and remove it from the set, and calculate the sum of the single container demand of all currently selected task requests to obtain the target value.
[0054] S303: When the target value is equal to the total number of cards in the machine, construct the task request group based on all currently selected task requests;
[0055] S304: When the target value is less than the total number of cards in the machine, return to the step of selecting the highest priority task request in the current task request set after removing the selected task request and continue execution until the target value is equal to the total number of cards in the machine, and construct the task request group based on all currently selected task requests.
[0056] The currently unprocessed task requests together constitute the current task request set. Each task request in the current task request set carries a corresponding priority. The priority of a task request can be configured by the server based on a task priority strategy. The server can configure the corresponding priority for task requests based on the order in which the tasks were received. Specifically, task requests received earlier have a higher priority than task requests received later. The server can also configure the corresponding priority for a task request based on the priority parameter information carried by the task request. Priority parameter information can include timeliness information and task type information. Timeliness information indicates whether the task request needs to be processed first, and task type information indicates the type of task the task request belongs to (such as the aforementioned page submission task, scheduled task, and API submission task), and each task type is configured with a priority weight.
[0057] For example, the current task request set includes task requests 1-6, and their priority order is task request 1> task request 2> task request 3> task request 4> task request 5> task request 6.
[0058] Select the highest priority task request (task request 1) from the current task request set and remove it from the set. After removing the selected task request, the current task request set will include task requests 2-6.
[0059] If the single container requirement (e.g., 8) indicated by the currently selected task request (task request 1) is equal to the total number of GPUs (e.g., 8), then the corresponding GPU device's total resources can be requested based on the currently selected task request (task request 1).
[0060] If the single container requirement (e.g., 2) indicated by the currently selected task request (task request 1) is less than the total card quantity (e.g., 8), then the highest priority task request is selected from the current task request set after removing the selected task request and removed from the set. That is, the highest priority task request (task request 2) is selected from tasks 2-6. At this time, the current task request set after removing the selected task request includes tasks 3-6. Furthermore, the sum of the single container requirements of all currently selected task requests (task request 1 and task request 2) is calculated, that is, the sum of the single container requirements indicated by task request 1 (e.g., 2) and task request 2, to obtain the target value.
[0061] If the single container requirement indicated by Task Request 2 is 6, then the target value (8) is equal to the total number of GPUs (e.g., 8). Then, a task request group is constructed based on Task Request 1 and Task Request 2, and the total resources of the corresponding GPU device are requested based on the task request group.
[0062] If the single container requirement indicated by task request 2 is 4, then the target value (6) is less than the total card quantity (e.g., 8). Then, return to the step of selecting the highest priority task request in the current task request set after removing the selected task request and continue execution until the target value is equal to the total card quantity. Then, construct a task request group based on all currently selected task requests, and apply for the corresponding GPU device's total resources based on the task request group.
[0063] For example, in the current task request set (task requests 3-6) after removing the selected task requests, the highest priority task request (task request 3) is selected and removed from the set. At this time, the current task request set after removing the selected task requests includes task requests 4-6. Then, the sum of the individual container requirements of all currently selected task requests (task request 1, task request 2, and task request 3) is calculated, that is, the sum of the individual container requirements indicated by task request 1 (e.g., 2), task request 2 (e.g., 4), and task request 3 is calculated to obtain the target value. If the individual container requirement indicated by task request 3 is 2, then the target value (8) is equal to the total number of GPU cards (e.g., 8). Then, a task request group is constructed based on task request 1, task request 2, and task request 3, and the corresponding GPU device's total resources are requested based on the task request group. If the single container requirement indicated by task request 3 is less than 2, then the target value is less than the total GPU resources (e.g., 8). In this case, the process returns to the step of selecting the highest priority task request from the current task request set after removing the selected task request, and continues until the target value equals the total GPU resources (e.g., 8). Then, a task request group is built based on all the currently selected task requests, and the corresponding GPU resources are requested based on the task request group.
[0064] In practical applications, task request aggregation and task grouping can be handled by the task layer on the server side, completing the construction of task request groups before task requests are sent to the resource layer. For example... Figure 5 As shown. If the total number of GPUs is 8, there are 2 tasks submitted via the page (task1, task2), 1 scheduled task (task3), and 2 tasks submitted via the interface (task4, task5). Task1 requires a container size of M, while task3 requires a container size of 8-M. In this case, tasks1 and task3 are aggregated into task_group 1 and sent to the resource layer. Otherwise, a GPU device allocates M GPU resource cards to task1 but not 8-M GPU resource cards to task3. As the number of unallocated cards increases, the GPU fragmentation rate also increases. If task2 requires a resource size of X, task4 requires a resource size of Y, and task5 requires a resource size of 8-XY, then tasks2, task4, and task5 are aggregated into task_group 2 and sent to the resource layer for production. The aggregated task_groups all send requests for a total of 8 GPUs to the resource layer, so the resource layer can allocate resources using a total of GPUs, without generating GPU fragmentation. This alleviates resource fragmentation caused by different container specifications in related technologies.
[0065] The above-described task request grouping scheme takes into account the priority of task requests. It prioritizes building task request groups consisting of high-priority task requests to allocate the corresponding GPU resources, ensuring that high-priority task requests are responded to first and guaranteeing the quality of server-side task request processing. This resource allocation model, which consolidates the sequential resource requests of individual tasks into batches, improves the efficiency of device resource matching. This is especially beneficial for situations with abundant task requests (such as AI model training tasks with a large number of requests and a high generation frequency), as it can improve task execution efficiency.
[0066] For cases where the target value exceeds the total GPU capacity, the most recently selected task request (task request 3) from all currently selected task requests (e.g., tasks 1-3) can be excluded from the candidate scope for building a task request group. After successfully building the task request group, the most recently selected task request can be moved back into the set. Of course, if the single container requirement indicated by the most recently selected task request equals the total GPU capacity, then the corresponding GPU resources can be requested based on the most recently selected task request.
[0067] Steps S201-S202 above are equivalent to constructing task request groups from a fixed set of current task requests at a specific point in time. The time precision of this point in time can be very high. This fixed set of current task requests is determined based on this point in time.
[0068] Of course, since the current task request set is dynamically changing, the highest priority task request A can be determined from the dynamically changing current task request set according to the relevant task strategy. When the single container demand indicated by this task request is less than the total card quantity, the highest priority task request B is then determined from the dynamically changing current task request set, and the sum of the indicated single container demands is calculated. The steps of determining the highest priority task request from the dynamically changing current task request set and calculating the sum of the indicated single container demands are repeated until the sum of the indicated single container demands equals the total card quantity, thereby constructing a task request group. Task request B in "determining the highest priority task request B from the dynamically changing current task request set" can be added during time intervals.
[0069] Furthermore, for all currently selected task requests, the task request with the highest priority is designated as the first task request, and the task request with the second highest priority is designated as the second task request. When only the first task request and the second task request are selected, before step S303, the number of first containers can be determined based on the total demand for GPU resource cards and the demand for a single container indicated by the first task request; and the number of second containers can be determined based on the total demand for GPU resource cards and the demand for a single container indicated by the second task request. During the construction of the task request group, there may be cases where the sum of the indicated single container demands equals the total number of GPU resource cards but the required number of containers is different. The required number of containers is obtained by dividing the total demand by the single container demand. Based on this, the number of containers required for the first task request (the number of first containers) and the number of containers required for the second task request (the number of second containers) are clearly defined here. Therefore, step S303 includes the following steps, see [link to relevant documentation]. Figure 4 .
[0070] S401: When the target value is equal to the total number of cards and the number of the first container is equal to the number of the second container, construct the task request group based on the first task request and the second task request;
[0071] S402: When the target value is equal to the total number of cards in the machine, and the number of the first container is greater than the number of the second container, the first task request is split into a first sub-task request indicating the number of the second container and a second sub-task request indicating the difference in the number, and the task request group is constructed based on the first sub-task request and the second task request; wherein, the difference in the number is the difference between the number of the first container and the number of the second container;
[0072] S403: When the target value is equal to the total number of cards in the machine and the number of the first container is less than the number of the second container, the second task request is split into a third sub-task request indicating the number of the first container and a fourth sub-task request indicating the difference in the number, and the task request group is constructed based on the first task request and the third sub-task request;
[0073] If the first container has 4 units and the second container also has 4 units, then the first task request and the second task request require the same number of containers. A task request group consisting of the first task request and the second task request can be constructed.
[0074] If the number of first containers is 6 and the number of second containers is 4, then the number of containers required for the first task request and the second task request are different. The first task request is split into a first sub-task request indicating the number of second containers (4) and a second sub-task request indicating the difference in number (2), and a task request group is constructed based on the first sub-task request and the second task request.
[0075] If the number of first containers is 3 and the number of second containers is 9, then the number of containers required for the first task request and the second task request are different. The second task request is split into a third sub-task request indicating the number of first containers (3) and a fourth sub-task request indicating the difference in number (6), and a task request group is constructed based on the third sub-task request and the first task request.
[0076] As for the second and fourth sub-task requests, they can be treated as independent task requests as candidate task requests for constructing other task request groups. During the construction of task request groups, the relevant descriptions in steps S301-S304 can be used to process these independent task requests.
[0077] In practical applications, task request aggregation and task grouping can be handled by the task layer on the server side. If the total number of GPUs is 8, there might be situations where the number of GPUs in different tasks can be combined to form an 8-GPU configuration, but the number of containers required by each task is different. For example, task A requires E P-card containers, and task B requires F Q-card containers. The sum of P and Q is 8 GPUs, but E and F are not equal. Assuming E is less than F, task A and task B can be aggregated into E GPUs, but task B's remaining (FE) Q-card containers will not be satisfied. In this case, other tasks will be prioritized for aggregation with task B's remaining (FE) Q-card containers. If aggregation fails, the corresponding task request will be sent to the resource layer for production. The resource layer can allocate GPU resources to this task request using a saturation-first algorithm. Considering the possibility that the sum of the indicated individual container requirements equals the total number of cards but the number of containers required may differ, task requests are split to improve the adaptability and flexibility of building task request groups.
[0078] As a possible implementation, after constructing a task request group whose sum of indicated single container requirements matches the total number of GPU cards and the single container requirements indicated by the task request, the method further includes the following steps: first, determining a target GPU device based on the task request group; then, creating a corresponding container on the target GPU device for each task request in the task request group according to the single container requirements indicated by each task request in the task request group.
[0079] The target GPU device can be determined based on its configuration information (such as the factory information of the configured GPU resource cards) and historical task processing information (such as the type of historical tasks processed and the time taken to process the corresponding historical tasks). Since the GPU device cluster can also include GPU devices with other total card counts besides 8, candidate GPU devices that meet the requirements can be identified from the GPU device cluster based on the total card count indicated by the task request group (e.g., 8) (excluding GPU devices with less than 8 cards), and then the aforementioned target GPU device can be determined. Furthermore, considering the number of containers required by the task request group, a target GPU device with the required number of containers can be determined. Accordingly, after determining the target GPU device, a corresponding container is created on the target GPU device for each task request in the task request group, based on the single container requirement indicated by each task request in the task request group. The creation of containers ensures that different task requests are processed in a relatively independent and isolated environment on the same GPU device, improving the security of task processing and the utilization of GPU resources. In practical applications, see [link to relevant documentation]. Figure 5 The resource layer includes a resource scheduling layer and GPU device nodes. The resource scheduling layer matches GPU resources and produces GPU containers based on task request groups.
[0080] As can be seen from the technical solutions provided in the embodiments of this application above, the grouping at the task request level in the embodiments of this application can ensure that the entire machine's resources are requested from the device as much as possible, thereby reducing the number of GPU resource cards appearing in fragmented form and effectively ensuring the utilization rate of device resources. Reducing the possibility of fragmented cards from the perspective of device resource request can avoid the situation where the total number of GPU resource cards meets user needs but cannot provide GPU devices that meet container specifications. Especially for situations with sufficient task requests (such as AI model training tasks with a large number of tasks and a high generation frequency), aggregating the serial resource requests of individual tasks into batch resource requests helps reduce task startup latency and improves the execution efficiency of multiple tasks (such as the iteration efficiency of training tasks).
[0081] This application embodiment also provides a device resource application apparatus 700, such as... Figure 7 As shown, the device includes:
[0082] Task Request Response Module 701: Used to determine the single container demand for the GPU resource card indicated by the task request in the current task request set;
[0083] Total GPU Quantity Determination Module 702: Used to determine the total number of GPUs in the device cluster;
[0084] Task request group construction module 703: is used to construct a task request group whose sum of the indicated single container demand is matched with the total number of GPU cards based on the total number of GPU cards and the single container demand indicated by the task request, so as to apply for corresponding GPU device resources based on the task request group.
[0085] It should be noted that the apparatus and method embodiments described in the device embodiments are based on the same inventive concept.
[0086] This application provides an electronic device including a processor and a memory. The memory stores at least one instruction or at least one program segment, which is loaded and executed by the processor to implement the device resource application method provided in the above method embodiments.
[0087] Furthermore, Figure 8 A schematic diagram of the hardware structure of an electronic device for implementing the device resource application method provided in the embodiments of this application is shown. The electronic device may participate in or include the device resource application apparatus provided in the embodiments of this application. Figure 8 As shown, the electronic device 80 may include one or more processors 802 (shown as 802a, 802b, ..., 802n in the figure) (processor 802 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 804 for storing data, and a transmission device 806 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of the I / O interface), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 8 The structure shown is for illustrative purposes only and does not limit the structure of the electronic device described above. For example, the electronic device 80 may also include... Figure 8 The more or fewer components shown, or having the same Figure 8 The different configurations shown.
[0088] It should be noted that the aforementioned one or more processors 802 and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be wholly or partially integrated into any other element within the electronic device 80 (or mobile device). As involved in the embodiments of this application, the data processing circuit serves as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).
[0089] The memory 804 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the device resource application method described in this application embodiment. The processor 802 executes various functional applications and data processing by running the software programs and modules stored in the memory 84, thereby realizing the aforementioned device resource application method. The memory 804 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 804 may further include memory remotely located relative to the processor 802, and these remote memories can be connected to the electronic device 80 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0090] The transmission device 806 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of the electronic device 80. In one example, the transmission device 806 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In one embodiment, the transmission device 806 may be a radio frequency (RF) module for wireless communication with the Internet.
[0091] The display can be, for example, a touchscreen liquid crystal display (LCD), which allows a user to interact with the user interface of an electronic device 80 (or a mobile device).
[0092] Embodiments of this application also provide a computer-readable storage medium, which can be disposed in an electronic device to store at least one instruction or at least one program related to implementing a device resource application method in the method embodiment. The at least one instruction or the at least one program is loaded and executed by the processor to implement the device resource application method provided in the above method embodiment.
[0093] Optionally, in this embodiment, the storage medium may be located at at least one of the multiple network servers in a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0094] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, the above description focuses on specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired results. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired results. In some implementations, multitasking and parallel processing are also possible or may be advantageous.
[0095] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device and electronic device embodiments are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0096] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
[0097] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A method for applying for equipment resources, characterized in that, The method includes: Determine the individual container demand for the GPU resource card indicated by the task requests in the current task request set; Determine the total number of GPUs in the device cluster; Based on the total GPU capacity and the single container demand indicated by the task request, a task request group is constructed whose sum of the indicated single container demands matches the total GPU capacity. The corresponding GPU device resources are then requested based on this task request group. The process involves: selecting the highest-priority task request from the current task request set and removing it from the set; each task request in the current task request set carries a corresponding priority. When the single container demand indicated by the currently selected task request is less than the total GPU capacity, the highest-priority task request is selected from the current task request set after removing the selected task request and removed from the set. The sum of the single container demands of all currently selected task requests is also calculated to obtain a target value. When the target value equals the total GPU capacity, the task request group is constructed based on all currently selected task requests. When the target value is less than the total GPU capacity, the process returns to the step of selecting the highest-priority task request from the current task request set after removing the selected task requests, continuing until the target value equals the total GPU capacity, at which point the task request group is constructed based on all currently selected task requests.
2. The method according to claim 1, characterized in that, After constructing a group of task requests whose sum of indicated individual container demands matches the total number of GPU cards and the total number of individual container demands indicated by the task requests, the method further includes: The target GPU device is determined based on the task request group; Based on the individual container demand indicated by each task request in the task request group, the target GPU device creates a corresponding container for each task request in the task request group.
3. The method according to claim 1, characterized in that, For all currently selected task requests, the task request with the highest priority is designated as the first task request, and the task request with the second highest priority is designated as the second task request. When only the first task request and the second task request are selected from all currently selected task requests, before constructing the task request group based on all currently selected task requests when the target value equals the total card quantity, the method includes: The number of first containers is determined based on the total demand for the GPU resource card and the demand for a single container as indicated in the first task request. The number of second containers is determined based on the total demand for the GPU resource card and the demand for a single container as indicated in the second task request. When the target value equals the total number of cards in the machine, the task request group is constructed based on all currently selected task requests, including: When the target value is equal to the total number of cards in the machine and the number of the first container is equal to the number of the second container, the task request group is constructed based on the first task request and the second task request. When the target value is equal to the total number of cards in the machine, and the number of the first container is greater than the number of the second container, the first task request is split into a first sub-task request indicating the number of the second container and a second sub-task request indicating the difference in number, and the task request group is constructed based on the first sub-task request and the second task request; When the target value is equal to the total number of cards in the machine, and the number of the first container is less than the number of the second container, the second task request is split into a third sub-task request indicating the number of the first container and a fourth sub-task request indicating the difference in the number, and the task request group is constructed based on the first task request and the third sub-task request; The quantity difference is the difference between the number of the first container and the number of the second container.
4. The method according to claim 1, characterized in that, The method further includes: Determine the interval between the current time and the time the task request was received; When the interval is less than the preset time, wait for the notification indicating that the task request group has been successfully constructed; When the interval is greater than or equal to a preset time, the corresponding GPU device resources are requested based on the task request.
5. The method according to claim 4, characterized in that, The step of requesting the corresponding GPU device resources based on the task request includes: Candidate GPU devices are identified from the device cluster using a saturation-first algorithm; Based on the total demand for GPU resource cards and the demand for individual containers indicated by the task request, the corresponding GPU device resources are determined from the candidate GPU devices.
6. The method according to claim 1, characterized in that, The method also includes receiving task requests: Receive the first type of task request submitted by the target object based on the interactive interface; Alternatively, receive a second type of task request generated periodically based on the target business; Alternatively, it can receive third-type task requests submitted via an interface.
7. A device for requesting equipment resources, characterized in that, The device includes: Task Request Response Module: Used to determine the demand for a single container on the GPU resource card indicated by the task request in the current task request set; Total GPU Quantity Determination Module: Used to determine the total number of GPUs in the device cluster; Task Request Group Construction Module: Used to construct a task request group whose sum of the indicated single container demand is matched with the total number of GPU cards and the total demand of a single container indicated by the task request, so as to apply for corresponding GPU device resources based on the task request group; Specifically, based on the total number of GPU cards in the system and the single container demand indicated by the task request, a task request group is constructed that matches the sum of the indicated single container demands with the total number of GPU cards in the system, including: The task request with the highest priority is selected from the current task request set and removed from the set. Each task request in the current task request set carries a corresponding priority. When the single container requirement indicated by the currently selected task request is less than the total number of cards, the highest priority task request is selected from the current task request set after the selected task request is removed and removed from the set, and the sum of the single container requirements of all currently selected task requests is calculated to obtain the target value. When the target value equals the total number of cards in the machine, the task request group is constructed based on all currently selected task requests; When the target value is less than the total number of cards in the machine, return to the step of selecting the highest priority task request from the current task request set after removing the selected task request and continue execution until the target value is equal to the total number of cards in the machine, and then construct the task request group based on all the currently selected task requests.
8. The apparatus according to claim 7, characterized in that, The device is also used for: After constructing a task request group that matches the sum of the indicated single container demand with the total number of GPU cards based on the total number of GPU devices and the single container demand indicated by the task request, the target GPU device is determined based on the task request group. Based on the individual container demand indicated by each task request in the task request group, the target GPU device creates a corresponding container for each task request in the task request group.
9. The apparatus according to claim 7, characterized in that, For all currently selected task requests, the task request with the highest priority is designated as the first task request, and the task request with the second highest priority is designated as the second task request. When only the first task request and the second task request are selected from all currently selected task requests, the device is further configured to: Before constructing the task request group based on all currently selected task requests when the target value equals the total number of GPU resource cards, the number of first containers is determined according to the total demand for GPU resource cards and the demand for a single container indicated by the first task request. The number of second containers is determined based on the total demand for the GPU resource card and the demand for a single container as indicated in the second task request. When the target value equals the total number of cards in the machine, the task request group is constructed based on all currently selected task requests, including: When the target value is equal to the total number of cards in the machine and the number of the first container is equal to the number of the second container, the task request group is constructed based on the first task request and the second task request. When the target value is equal to the total number of cards in the machine, and the number of the first container is greater than the number of the second container, the first task request is split into a first sub-task request indicating the number of the second container and a second sub-task request indicating the difference in number, and the task request group is constructed based on the first sub-task request and the second task request; When the target value is equal to the total number of cards in the machine, and the number of the first container is less than the number of the second container, the second task request is split into a third sub-task request indicating the number of the first container and a fourth sub-task request indicating the difference in the number, and the task request group is constructed based on the first task request and the third sub-task request; The quantity difference is the difference between the number of the first container and the number of the second container.
10. The apparatus according to claim 7, characterized in that, The device is also used for: Determine the interval between the current time and the time the task request was received; When the interval is less than the preset time, wait for the notification indicating that the task request group has been successfully constructed; When the interval is greater than or equal to a preset time, the corresponding GPU device resources are requested based on the task request.
11. The apparatus according to claim 10, characterized in that, The step of requesting the corresponding GPU device resources based on the task request includes: Candidate GPU devices are identified from the device cluster using a saturation-first algorithm; Based on the total demand for GPU resource cards and the demand for individual containers indicated by the task request, the corresponding GPU device resources are determined from the candidate GPU devices.
12. The apparatus according to claim 7, characterized in that, The device is also used to receive task requests: Receive the first type of task request submitted by the target object based on the interactive interface; Alternatively, receive a second type of task request generated periodically based on the target business; Alternatively, it can receive third-type task requests submitted via an interface.
13. An electronic device, characterized in that, The electronic device includes a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the device resource request method as described in any one of claims 1-6.
14. A computer-readable storage medium, characterized in that, The storage medium stores at least one instruction or at least one program segment, which is loaded and executed by a processor to implement the device resource request method as described in any one of claims 1-6.
15. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads and executes the computer instructions from the computer-readable storage medium to cause the computer device to perform the device resource request method as described in any one of claims 1-6.