Computing resource scheduling method and device

By creating master and slave functions on the cloud service platform and dynamically adjusting the resource specifications of the slave functions, the problem of the upper limit of computing power resources of a single physical machine is solved, realizing the expansion of computing power resources and the improvement of user experience.

CN122309100APending Publication Date: 2026-06-30PETAL CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PETAL CLOUD TECH CO LTD
Filing Date
2024-12-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing cloud service platforms cannot exceed the computing power limit of a single physical machine, nor can they dynamically adjust computing power resources, resulting in a poor user experience.

Method used

By creating master and slave functions on the cloud service platform, establishing a connection between the master and slave functions, and dynamically adjusting the resource specifications of the slave functions to expand computing resources, the resource limit of a single physical machine can be broken.

Benefits of technology

It has enabled the expansion of computing resources, broken through the resource limit of a single physical machine, improved the convenience of adjusting user resource specifications, and enhanced the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309100A_ABST
    Figure CN122309100A_ABST
Patent Text Reader

Abstract

This application relates to the field of artificial intelligence technology, and in particular to a computing power resource scheduling method and apparatus. This method can overcome the bottleneck of computing power resources of a single physical machine. In response to a first user instruction to create a function, the method displays first resource specification information; the first resource specification information describes the first resource specification corresponding to the user-created main function; in response to a second user instruction to enable a slave function, the method displays second resource specification information; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first and second resource specifications is greater than the computing power limit corresponding to a single physical machine; the first and second resource specification information are uploaded to a server; the server is used to obtain a first container image corresponding to the main function and at least one second container image corresponding to at least one slave function; a cloud service program is deployed in the first container image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of AI technology, and in particular to a method and apparatus for scheduling computing resources. Background Technology

[0002] When performing inference or other types of tasks, artificial intelligence (AI) models may need to process large-scale data and perform high-concurrency computations. The computing efficiency of the central processing unit (CPU) is often insufficient to meet the computational power demands of such scenarios. Processors such as graphics processing units (GPUs) and neural network processing units (NPUs) can be used to assist the CPU in performing high-concurrency computations. Users can rent cloud servers with NPUs and / or GPUs from cloud service platforms to meet the computational resource requirements of AI models.

[0003] Currently, some cloud service platforms have failed to overcome the computing power resource limits of a single physical machine. For example, a single physical machine can currently accommodate a maximum of eight NPU or GPU physical cards, and some cloud service platforms are unable to overcome the performance bottleneck of eight physical cards on a single physical machine. Summary of the Invention

[0004] This application provides a computing resource scheduling method and apparatus that can break through the computing resource limit of a single physical machine and realize cross-machine scheduling.

[0005] In a first aspect, embodiments of this application provide a computing resource scheduling method, the method comprising: a terminal device responding to a first user instruction to create a function and displaying first resource specification information; the first resource specification information describing the first resource specification corresponding to a main function created by the user; the terminal device responding to a second user instruction to enable a slave function and displaying second resource specification information; the second resource specification information describing the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first and second resource specifications being greater than the computing power limit corresponding to a single physical machine; the terminal device uploading the first and second resource specification information to a server; the server obtaining a first container image corresponding to the main function and at least one second container image corresponding to at least one slave function; a cloud service program being deployed in the first container image; the server starting a first container instance conforming to the first resource specification information based on the first container image, and starting at least one second container instance conforming to the second resource specification information based on at least one second container image; and establishing a connection between the first container instance and at least one second container instance; the server responding to a call request for the cloud service program, decomposing the computing tasks corresponding to the cloud service program, and distributing them to the first container instance and at least one second container instance through the connection.

[0006] The method proposed in this application allows users to create a main function with a first resource specification and several slave functions with a second resource specification at the same time. After the container instances corresponding to the main function and the container instances corresponding to the slave functions are connected, they can interact with each other. The computing tasks that were originally undertaken by the main function alone can be split into multiple parts, and some of the computing tasks can be undertaken by the slave functions. Moreover, the container instances of the main function and the container instances of the slave functions can compute in parallel. This is equivalent to expanding the range of available computing power resources from the original first resource specification (that is, the resource specification of a single physical machine) to the sum of the first and second resource specifications, thereby achieving the purpose of expanding computing power resources and no longer being limited by the resource limit of a single physical machine.

[0007] In one possible implementation, the terminal device, in response to a third user instruction to edit a created function, displays third resource specification information; the third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one modified function; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; the terminal device uploads the third resource specification information to the server; the server, based on at least one second container image, starts at least one third container instance conforming to the third resource specification information; a connection is established between the first container instance and at least one third container instance; in response to a call request for a cloud service program, the server decomposes the computing tasks corresponding to the cloud service program through the above connection and distributes them to the first container instance and at least one third container instance.

[0008] The method proposed in this application supports dynamic modification of resource specifications based on an existing function. This eliminates the need for redeployment or function re-creation; simply editing the resource specifications of the slave function based on the existing function is sufficient to dynamically adjust computing resources, improving the ease of modification and enhancing the user experience. In some embodiments, when editing an existing function, in addition to modifying the resource specifications of the slave function, the resource specifications of the main function can also be modified.

[0009] Secondly, embodiments of this application also provide a computing resource scheduling method. This method is applied to a server, which deploys a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function. A cloud service program is deployed in the first container image. The method includes: the server receiving first resource specification information and second resource specification information uploaded by a first terminal device; the first resource specification information describes the first resource specification corresponding to the main function created by the user; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first and second resource specifications is greater than the computing power limit corresponding to a single physical machine; the server starts a first container instance conforming to the first resource specification information based on the first container image, and starts at least one second container instance conforming to the second resource specification information based on at least one second container image; the server establishes a connection between the first container instance and at least one second container instance; in response to a call request to the cloud service program, the server decomposes the computing tasks corresponding to the cloud service program and distributes them to the first container instance and at least one second container instance through the connection.

[0010] The method proposed in this application can expand computing resources and break through the resource limit of a single physical machine.

[0011] In one possible implementation, before receiving the first resource specification information and the second resource specification information uploaded by the first terminal device, the method further includes: the server responding to a first acquisition request from the second terminal device by sending a first operating system image to the second terminal device; wherein the first operating system image is an operating system image integrating a master-slave function interaction module; the master-slave function interaction module is used to establish a connection between a first container instance and at least one second container instance, and in response to a call request for a cloud service program, decompose the computing tasks corresponding to the cloud service program, and distribute them to the first container instance and at least one second container instance through the connection; the server receiving the first container image uploaded by the second terminal device; wherein the first container image is obtained by installing a cloud service program in the first operating system image; the cloud service program integrates a trained artificial intelligence (AI) model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

[0012] In one possible implementation, before receiving the first resource specification information and the second resource specification information uploaded by the first terminal device, the method further includes: the server responding to a second acquisition request from the second terminal device by sending a software development kit (SDK) integrating a master-slave function interaction module to the second terminal device; the master-slave function interaction module is used to establish a connection between a first container instance and at least one second container instance, and in response to a call request for a cloud service program, decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection; the server receiving a first container image uploaded by the second terminal device; the first container image is obtained by installing the cloud service program in an operating system image; the cloud service program integrates the SDK and a trained AI model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

[0013] In one possible implementation, a function management module is deployed in the server. Before the server establishes a connection between the first container instance and at least one second container instance, the method further includes: the function management module transmitting at least one set of network information corresponding to at least one second container instance to the first container instance; one set of information in the at least one set of network information includes one or more of Internet Protocol IP address information and port number corresponding to a second container instance; the server establishing a connection between the first container instance and at least one second container instance includes: the server triggering the first container instance to initiate a connection request to the corresponding at least one second container instance based on the at least one set of network information; each of the at least one second container instance, in response to the connection request, obtaining configuration information corresponding to its own container instance; the configuration information includes resource specification information corresponding to a second container instance; if the first container instance determines that the configuration information conforms to the second resource specification information, it issues a certificate to be verified to the second container instance; the second container instance verifies the certificate, and if the verification is successful, a connection is established between the first container instance and at least one second container instance.

[0014] In one possible implementation, a master-slave function module is deployed in the first container image, and a slave function program is deployed in the second container image; the configuration information also includes version information of the slave function program corresponding to a second container instance; if the first container instance determines that the configuration information conforms to the second resource specification information, it issues a certificate to be verified to the second container instance, including: if the first container instance determines that the configuration information conforms to the second resource specification information and that the version information of the slave function is compatible with the version information of the master-slave function module, it issues a certificate to be verified to the second container instance.

[0015] In one possible implementation, the server receives third resource specification information uploaded by a first terminal device; the third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one modified slave function; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; the server starts at least one third container instance conforming to the third resource specification information based on at least one second container image; a connection is established between the first container instance and at least one third container instance; in response to a call request for a cloud service program, the server decomposes the computing tasks corresponding to the cloud service program through the connection and distributes them to the first container instance and at least one third container instance.

[0016] In one possible implementation, each of the second resource specification information and the third resource specification information includes the number of slave functions and the resource specification corresponding to a single slave function; launching at least one third container instance conforming to the third resource specification information includes: if it is determined that the resource specification of a single slave function has changed, scaling down and deleting all instances in at least one second container instance; launching at least one third container instance according to the third resource specification information with the corresponding number of slave functions and the corresponding resource specification of a single slave function; or, if it is determined that the resource specification corresponding to a single slave function has not changed and the number of slave functions has changed, adding or deleting second container instances to obtain at least one third container instance conforming to the number of slave functions in the third resource specification information.

[0017] Thirdly, embodiments of this application also provide a computing resource scheduling method, which is applied to a terminal device. The method includes: the terminal device responding to a first user instruction to create a function and displaying first resource specification information; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user; the terminal device responding to a second user instruction to enable a slave function and displaying second resource specification information; the second resource specification information is used to describe the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; and the terminal device uploading the first resource specification information and the second resource specification information to a server.

[0018] In one possible implementation, the method provided by the third aspect further includes: the terminal device responding to a third user instruction to edit the created function, displaying third resource specification information; the third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one modified slave function; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; and the terminal device uploading the third resource specification information to the server.

[0019] Fourthly, embodiments of this application also provide a computing resource scheduling system, including a terminal device and a server. The terminal device includes: a front-end interaction module, used to display first resource specification information in response to a first user instruction to create a function; the first resource specification information describes the first resource specification corresponding to the user-created main function; and to display second resource specification information in response to a second user instruction to enable a slave function; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first and second resource specifications is greater than the computing power limit corresponding to a single physical machine; and an upload module, used to upload the first and second resource specification information to the server. The server deploys a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function; the first container image deploys a cloud service program; the server specifically includes: a receiving module, used to receive first resource specification information and second resource specification information uploaded by a first terminal device; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user; the second resource specification information is used to describe the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; a function management module, used to start a first container instance conforming to the first resource specification information based on the first container image, and to start at least one second container instance conforming to the second resource specification information based on at least one second container image; a master-slave function interaction module, used to establish a connection between the first container instance and at least one second container instance, and in response to a call request to the cloud service program, to decompose the computing tasks corresponding to the cloud service program, and distribute them to the first container instance and at least one second container instance through the connection.

[0020] Fifthly, embodiments of this application also provide a terminal device, comprising: a front-end interaction module, configured to display first resource specification information in response to a first user instruction to create a function; the first resource specification information describing the first resource specification corresponding to the main function created by the user; and, in response to a second user instruction to enable a slave function, display second resource specification information; the second resource specification information describing the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first and second resource specifications being greater than the computing power limit corresponding to a single physical machine; and an upload module, configured to upload the first and second resource specification information to a server.

[0021] Sixthly, embodiments of this application also provide a server, in which a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function are deployed; a cloud service program is deployed in the first container image; the server includes: a receiving module, used to receive first resource specification information and second resource specification information uploaded by a first terminal device; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user; the second resource specification information is used to describe the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; a function management module, used to start a first container instance conforming to the first resource specification information based on the first container image, and to start at least one second container instance conforming to the second resource specification information based on at least one second container image; a master-slave function interaction module, used to establish a connection between the first container instance and at least one second container instance, and in response to a call request to the cloud service program, to decompose the computing tasks corresponding to the cloud service program, and distribute them to the first container instance and at least one second container instance through the connection.

[0022] In one possible implementation, the server further includes a distribution module, configured to distribute a first operating system image to the second terminal device in response to a first acquisition request from the second terminal device; wherein the first operating system image is an operating system image integrating a master-slave function interaction module; the master-slave function interaction module is configured to establish a connection between a first container instance and at least one second container instance, and in response to a request to call a cloud service program, decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection; the receiving module is further configured to receive the first container image uploaded by the second terminal device; wherein the first container image is obtained by installing a cloud service program on the first operating system image; the cloud service program integrates a trained artificial intelligence (AI) model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

[0023] In one possible implementation, the server further includes a distribution module, used to distribute a software development kit (SDK) integrating a master-slave function interaction module to the second terminal device in response to a second acquisition request from the second terminal device; the master-slave function interaction module is used to establish a connection between the first container instance and at least one second container instance, and in response to a call request for a cloud service program, to decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection; the receiving module is further used to receive a first container image uploaded by the second terminal device; the first container image is obtained by installing the cloud service program in an operating system image; the cloud service program integrates the SDK and a trained AI model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

[0024] In one possible implementation, the master-slave function interaction module is integrated into the first container image, and the slave function program is integrated into the second container image. Before establishing a connection between the first container instance and at least one second container instance, the function management module is further configured to: transmit at least one set of network information corresponding to at least one second container instance to the first container instance; one set of information in the at least one set of network information includes one or more of the Internet Protocol IP address information and port number corresponding to a second container instance; when establishing a connection between the first container instance and at least one second container instance, the master-slave function interaction module in the first container instance is specifically configured to: initiate a connection request to the corresponding at least one second container instance based on the at least one set of network information; the slave function program in each of the at least one second container instance is specifically configured to: in response to the connection request, obtain the configuration information corresponding to its own container instance; the configuration information includes resource specification information corresponding to a second container instance; the master-slave function interaction module in the first container instance is further configured to: if the configuration information conforms to the second resource specification information, issue a certificate to be verified to the second container instance; the slave function program in the second container instance is further configured to verify the certificate, and if the verification is successful, a connection is established between the first container instance and at least one second container instance.

[0025] In one possible implementation, the configuration information also includes version information of the slave function program corresponding to the second container instance; when it is determined that the configuration information conforms to the second resource specification information, and a certificate to be verified is sent to the second container instance, the master-slave function interaction module in the first container instance is specifically used to: when it is determined that the configuration information conforms to the second resource specification information and the version information of the slave function is compatible with the version information of the master-slave function module, send a certificate to be verified to the slave function program in the second container instance.

[0026] In one possible implementation, the receiving module is further configured to: receive third resource specification information uploaded by the first terminal device; the third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one slave function after modification; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; the function management module is further configured to: start at least one third container instance that conforms to the third resource specification information based on at least one second container image; the master-slave function interaction module is further configured to: establish a connection between the first container instance and at least one third container instance; in response to a call request for a cloud service program, decompose the computing task corresponding to the cloud service program through the connection, and distribute it to the first container instance and at least one third container instance.

[0027] In one possible implementation, each of the second and third resource specification information includes the number of slave functions and the resource specification corresponding to a single slave function. When starting at least one third container instance that conforms to the third resource specification information, the function management module is specifically configured to: if it is determined that the resource specification of a single slave function has changed, shrink and delete all instances in at least one second container instance; according to the third resource specification information, start at least one third container instance with the corresponding number of slave functions and the corresponding resource specification of a single slave function; or, if it is determined that the resource specification corresponding to a single slave function has not changed but the number of slave functions has changed, add or delete second container instances to obtain at least one third container instance that conforms to the number of slave functions in the third resource specification information.

[0028] In a seventh aspect, embodiments of this application also provide an electronic device, the electronic device comprising: a processor, the processor being configured to execute a computer program or instructions in a memory to implement the method as described in any one of the first to third aspects above.

[0029] Eighthly, embodiments of this application also provide a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the computing device cluster to perform the method as described in any one of the first to third aspects above.

[0030] In a ninth aspect, embodiments of this application also provide a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method as described in any one of the first to third aspects above.

[0031] In a tenth aspect, embodiments of this application also provide a computer-readable storage medium including computer program instructions, which, when executed by a cluster of computing devices, perform the method as described in any one of the first to third aspects above.

[0032] Eleventhly, embodiments of this application also provide a chip system, including: a communication interface for inputting and / or outputting data; and a processor for executing a computer-executable program, causing a device equipped with the chip system to perform the method as described in any one of the first to third aspects above. Attached Figure Description

[0033] Figure 1 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application;

[0034] Figure 2 This is a schematic diagram of the server structure provided in an embodiment of this application;

[0035] Figure 3 A system architecture example diagram of the computing resource scheduling method provided in the embodiments of this application;

[0036] Figure 4 The computing resource scheduling method provided in the embodiments of this application is illustrated in a flowchart with three parts;

[0037] Figure 5 An interactive flowchart of the computing resource scheduling method provided in the embodiments of this application;

[0038] Figure 6a A schematic diagram illustrating one method of obtaining a custom container image in the computing resource scheduling method provided in this application embodiment;

[0039] Figure 6b A schematic diagram illustrating the second method for obtaining a custom container image in the computing resource scheduling method provided in this application embodiment.

[0040] Figure 7 Example diagram of the function management interface in the computing resource scheduling method provided in the embodiments of this application;

[0041] Figure 8 A schematic diagram of master-slave function connection in some embodiments of the computing resource scheduling method provided in this application;

[0042] Figure 9 A comparative diagram of the distribution locations of master-slave function interaction modules in some embodiments of the computing resource scheduling method provided in this application;

[0043] Figure 10A schematic diagram of the master-slave function connection in some embodiments of the computing resource scheduling method provided in this application;

[0044] Figure 11 Example diagrams of the function management interface for dynamically modifying slave functions (or also modifying the main function) in some embodiments of the computing resource scheduling method provided in this application;

[0045] Figure 12 A flowchart illustrating the process of dynamically modifying slave function configuration information in some embodiments of the computing resource scheduling method provided in this application.

[0046] Figure 13 A flowchart illustrating the computing resource scheduling method provided in the embodiments of this application from the perspective of the terminal device;

[0047] Figure 14 A flowchart illustrating the computing resource scheduling method provided in the embodiments of this application from the perspective of the server side;

[0048] Figure 15 A schematic diagram of a computing device provided in an embodiment of this application;

[0049] Figure 16 A schematic diagram of a computing device cluster provided in an embodiment of this application;

[0050] Figure 17 This is a schematic diagram showing that one or more computing devices in a computing device cluster provided in an embodiment of this application can be connected via a network. Detailed Implementation

[0051] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings. Obviously, the described embodiments are only a part of, and not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.

[0052] In the description of the embodiments of this application, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The word "and / or" in the text is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Furthermore, in the description of the embodiments of this application, "multiple" refers to two or more.

[0053] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as implying or suggesting relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

[0054] In the embodiments of this application, the words "exemplary" or "for example" are used to indicate that they are examples, illustrations, or descriptions. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design options.

[0055] AI services (Artificial Intelligence Services) refer to various functions and services provided using artificial intelligence technology, designed to automate and intelligently solve specific problems or complete specific tasks. AI services can be embedded in various applications to help users improve efficiency, reduce costs, and enhance user experience.

[0056] The process of an AI service developer (first user) providing an AI service can be divided into the following stages:

[0057] Phase 1: AI Model Development and Training. In this phase, it is necessary to determine a suitable AI algorithm and model architecture, write the code, and train the AI ​​model to obtain a trained AI model.

[0058] Phase Two: AI Model Deployment. The trained AI model is integrated into the cloud service application, which is then deployed to the production environment. An API interface for external invocation is provided. This gives developers an externally accessible AI service, allowing users to call the AI ​​model within the cloud service application through other applications or services. The cloud service application can be an application or software module running in a cloud environment, utilizing cloud infrastructure to provide various services. In other words, the cloud service application can be an application running in the cloud environment, or a software module with certain functions capable of providing services.

[0059] Phase Three: AI Model Inference. Users (AI service users, second users) can use other applications or services to call cloud service programs that integrate AI models via API interfaces, pass input data to the model, obtain output results, and return them to the user through the API interface.

[0060] The three stages mentioned above do not end after one round of execution, but rather they are continuously iterated. During the iteration process, new data is continuously provided, model parameters are adjusted, and algorithms are optimized, thereby improving the efficiency and accuracy of the model and providing better service.

[0061] In response to user requests, AI models need to perform one or more of different types of tasks, such as inference, prediction, and classification. Taking inference as an example, in the inference process, large-scale data needs to be processed and high-concurrency computation is required. In this scenario, the CPU has low computational efficiency and it is difficult to achieve high-concurrency computation.

[0062] GPUs and NPUs are hardware devices used to accelerate AI inference. GPUs were originally designed for graphics rendering, while NPUs are specifically designed for AI inference, with their architecture optimized for neural network computation. In practical applications, both can perform parallel computing and support deep learning frameworks. The choice between GPUs and NPUs depends on the specific computing task and application scenario. For scenarios requiring extensive neural network computation, NPUs typically offer better performance and efficiency; while for scenarios requiring graphics computing or general-purpose computing, such as file compression or video transcoding, GPUs may be a better choice. AI service developers (hereinafter referred to as developers) can deploy resources on cloud service platforms that provide cloud server rental services, specifying whether the resource type used is GPU or NPU.

[0063] Specifically, to facilitate understanding of the technical solutions in the embodiments of this application, two resource deployment methods from related technologies are listed:

[0064] Resource deployment method 1:

[0065] Based on the interactive interface provided by the cloud service platform, developers can rent cloud servers with NPUs and / or GPUs from the cloud service platform by operating through the interactive interface. The cloud servers can be physical machines or virtual machines.

[0066] When renting, developers can select specific server resource specifications and operating system images. Server resource specifications include basic resource information such as CPU and memory, as well as specific card types and quantities. Card types can be GPUs or NPUs, or other types such as TPUs. The number of cards can be selected from 1 to 8, with a maximum of 8 physical cards per physical machine. GPU physical cards refer to physical cards containing a GPU, while NPU physical cards refer to physical cards containing an NPU. For example, if a developer selects GPU as the card type and 6 GPU cards, it means the developer needs to rent a physical machine or virtual machine with 6 GPU cards.

[0067] In operating system images, developers can choose public images provided by cloud service platforms or private images they build themselves.

[0068] Developers log in to the rented cloud server and deploy their pre-prepared applications or services, which integrate trained AI models, to the cloud server.

[0069] Developers provide API interfaces for calling deployed applications or services, and users can use the AI ​​services provided by the developers by calling these API interfaces.

[0070] The above resource deployment method one has the following problems:

[0071] Currently, cloud service platforms only support deployment on a single machine (i.e., a single physical machine), and cross-machine scheduling is not possible. Furthermore, due to hardware limitations, a single machine can only support a maximum of eight physical cards; deployment beyond eight cards is not supported, resulting in a significant performance ceiling. Meanwhile, many AI models have increasingly higher demands for computing resources. In addition, deployed inference services do not support dynamic changes in the number of cards. When the computing power requirements of an AI model change, the entire process of resource leasing and environment deployment must be redone. In other words, when developers iterate on their trained AI models and need to change the GPU and / or NPU computing power requirements, the dependent GPU / NPU specifications or number of cards cannot be modified. Developers must re-rent new machine resources and complete all the above processes again, which is cumbersome and the user experience needs improvement.

[0072] Deployment Method Two:

[0073] Currently, in another deployment method, after training the model, AI service developers can deploy it using a platform provided by a cloud service provider following the process:

[0074] The trained AI model is integrated into the application or cloud service program, and a container image is built based on the underlying operating system image. This image is then uploaded to the cloud service provider's image repository. For example, developers can use the Docker container tool to deploy an application that integrates the AI ​​model based on a publicly released Debian 9 operating system image, creating a custom container image.

[0075] Next, in the interactive interface provided by the cloud service provider, create a function. Select a function type that supports GPU and / or NPU capabilities. For example, if the resource corresponding to a Web function is GPU and / or NPU, then select Web function. For the runtime environment, select a custom container image that supports GPU / NPU capabilities. Specifically, when selecting the container image, choose the custom container image created in the previous step from the image repository.

[0076] In deployment method two, the created functions can run as containers, and each function can be considered a container instance. This deployment method virtualizes resources such as CPU, GPU, and memory. The GPU can be selected by card type and video memory; specifically, a single GPU card can be split into multiple parts, each provided to different container instances. During splitting, GPU video memory and GPU computing power remain proportional. Therefore, the maximum selectable GPU specification for a single container instance (or a single function) is equivalent to the computing power and GPU video memory of a single GPU card. For example, the maximum GPU specification for Tesla series instances is 16GB, for Ampere series instances it's 24GB, and for Ada series instances it's only allowed 48GB. Thus, the maximum selectable GPU specification for a single function is 16GB, 24GB, or 48GB. The minimum selectable GPU video memory for a single function is 1GB and the corresponding proportional computing power.

[0077] Next, after creating the function, developers can expose the API interface to provide AI services to the outside world.

[0078] Deployment method two has the following problems:

[0079] The resource specifications corresponding to the functions provided by the cloud service platform only support a fixed number of cards, for example, usually only one card is supported at most, and there is a performance limit.

[0080] When it is necessary to change the specifications of GPU / NPU computing power, the specifications or number of GPUs that are relied upon cannot be modified. Developers need to create new functions, select the corresponding resource specifications, and provide services to the outside world.

[0081] Based on the deployment methods listed above, it can be seen that in the cloud server rental services currently provided by cloud service platforms, the maximum selectable resource specification in deployment method two is the full computing power of a single GPU card or NPU card, while the maximum selectable resource specification in deployment method one is 8 GPU cards or NPU cards. Both have obvious performance limits and cannot meet the needs of models with computing power resource requirements exceeding 8 cards.

[0082] In other words, when using NPU / GPU for inference, a single server instance can only be bound to a fixed number of NPU / GPU cards, typically a maximum of 8, and its capacity has an upper limit. If performance is still insufficient even with 8 cards, additional custom code is required to schedule between multiple physical machines. Currently, cloud service platforms cannot help developers break through the single-machine NPU / GPU card limit.

[0083] Furthermore, based on the above deployment methods one and two, it can be seen that the deployed inference service does not support dynamic changes to resource specifications. When resource specifications need to be changed, the entire process needs to be re-executed or a new function needs to be created, which is cumbersome. Dynamic changes to resource specifications are not supported, and flexibility and adaptability need to be improved. The user experience (AI service developers) needs to be improved.

[0084] When using NPU / GPU for inference, the model's performance is strongly correlated with the NPU / GPU computing power of the machine resources. If the computing power specification needs to be changed, the inference service must be taken offline from the current server and then redeployed on a server with a different specification. The process is complicated, and the online / offline operations will affect the user experience (the user of the AI ​​service).

[0085] In view of this, this application proposes a computing resource scheduling method, which supports AI service developer U1 in creating main functions and slave functions in the interactive interface provided by the cloud service platform, and supports setting or selecting the resource specifications corresponding to the main function and the slave functions. The main function can select the resource specifications corresponding to a single physical machine, for example, a maximum of 8 physical cards. The number of slave functions can be one or more, and the sum of the resource specifications corresponding to one or more slave functions and the resource specifications corresponding to the main function can be greater than or significantly greater than the upper limit of the resource specifications of a single physical machine. The container instances corresponding to the main function and the container instances corresponding to the slave functions can establish connections with each other. After establishing a connection, they can provide AI services externally. In response to the AI ​​service user U2's request to use the cloud service program deployed by the AI ​​service developer on the cloud service platform, the main function can decompose the computational tasks in the inference task and distribute them to one or more slave functions. After receiving the distributed computational tasks, the slave functions can perform the corresponding calculations and return the calculation results to the main function container instance. The main function container instance can summarize the calculation results of the main function and the slave functions and then distribute the inference results obtained from the calculation results to the electronic device on the AI ​​service user's side.

[0086] It should be noted that in this embodiment, both the main function and the slave functions can run as containers. The main function and slave functions can be viewed as container instances running on a cloud service platform. The main function is a container instance with a function name that can run independently, while the slave functions are container instances that run dependent on the main function; slave functions can also be called child instances. Generally, when AI service developers rent computing resources for cloud service programs, they first create a main function. The child instances created based on this main function are called slave function container instances (or simply slave functions). There is typically one main function, but there can be multiple slave functions, and multiple slave functions can run dependent on a single main function. For example, in... Figure 11In the interface I0 shown, the function named "**rithm_proxy" is a main function. Multiple sub-instances can be enabled based on this main function to expand computing resources. For example, in... Figure 7 In the function creation interface I1 shown, the controls within the dashed box I11 are used to configure the resource specifications corresponding to the main function, and the controls within the dashed box I12 are used to enable child instances. The number of slave functions and the resource specifications corresponding to each slave function can be configured in interface I2.

[0087] Thus, when creating a function, based on the main function, the resource specifications of the slave function can be configured to utilize the computing resources corresponding to the slave function to undertake part of the computing tasks of the main function and perform parallel computing with the main function, thereby achieving the purpose of enhancing NPU / GPU computing power and breaking through the physical limitation of a maximum of 8 GPU / NPU cards on a single physical machine.

[0088] Furthermore, when it is necessary to modify the GPU / NPU computing power, there is no need to recreate the function; the GPU / NPU computing power can be dynamically modified directly by changing the specification of the function.

[0089] For example, the resource specifications corresponding to the main function are: card type is GPU; graphics card: 8*NVIDIA Tesla V100; video memory: 8*32GB;

[0090] The resource specifications for the slave function are as follows: GPU card type; 2 slave functions; Graphics card: 1*NVIDIA TeslaV100; Video memory: 1*32GB;

[0091] Therefore, the sum of the resource specifications of the main function and the slave function is equivalent to 10 * NVIDIA Tesla V100, which is equivalent to expanding the original upper limit of 8 GPU cards to 10 GPU cards, breaking the upper limit of the resource specifications of a single physical machine.

[0092] When it is necessary to modify the GPU / NPU computing power, only the resource specifications corresponding to the slave functions need to be modified, without recreating the main function. For example, the number of slave functions can be modified to 8, while the resource specifications corresponding to a single slave function remain unchanged. This is equivalent to increasing the computing power from 10 GPU cards to 16 GPU cards through modification.

[0093] It should be noted that there can be one or more slave functions, and the upper limit of the resource specifications of a single slave function can be the current upper limit of the resource specifications of a single physical machine (e.g., currently 8). There is no upper limit to the number of slave functions, which can be set according to the actual hardware resource situation.

[0094] The computing resource scheduling method provided in this application can be applied to electronic devices. Electronic devices can be user-side (e.g., developer-side) terminal devices, such as smartphones, tablet PCs, laptops, desktop computers, wearable devices, augmented reality (AR) / virtual reality (VR) devices, ultra-mobile personal computers (UMPCs), netbooks, or personal digital assistants (PDAs). This application does not impose any restrictions on the specific type of electronic device.

[0095] For example, Figure 1 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application. For example, the electronic device may be a smartphone, such as... Figure 1 As shown, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone jack 170D, a sensor module 180, buttons 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, a barometric pressure sensor 180C, a magnetic sensor 180D, an accelerometer sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.

[0096] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0097] Processor 110 may include one or more processing units, such as application processors (APs), modem processors, graphics processing units (GPUs), image signal processors (ISPs), controllers, video codecs, digital signal processors (DSPs), baseband processors, and / or neural network processing units (NPUs). These different processing units may be independent devices or integrated into one or more processors.

[0098] The controller can generate operation control signals based on the instruction opcode and timing signals to complete the control of instruction fetching and execution.

[0099] The processor 110 may also include a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. This memory can store instructions or data that the processor 110 has just used or that are used repeatedly. If the processor 110 needs to use the instruction or data again, it can retrieve it directly from the memory. This avoids repeated accesses, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

[0100] In some embodiments, the processor 110 may include one or more interfaces. Interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver / transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input / output (GPIO) interface, a subscriber identity module (SIM) interface, and / or a universal serial bus (USB) interface, etc.

[0101] USB port 130 is a USB standard compliant interface, specifically a Mini USB port, Micro USB port, USB Type-C port, etc. USB port 130 can be used to connect a charger to charge electronic device 100, and can also be used for data transfer between electronic device 100 and peripheral devices. It can also be used to connect headphones for audio playback. This interface can also be used to connect other electronic devices, such as AR devices.

[0102] It is understood that the interface connection relationships between the modules illustrated in the embodiments of this application are merely illustrative and do not constitute a structural limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may also employ different interface connection methods or combinations of multiple interface connection methods as described in the above embodiments.

[0103] The charging management module 140 receives charging input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 receives charging input from the wired charger via a USB interface 130. In some wireless charging embodiments, the charging management module 140 receives wireless charging input via the wireless charging coil of the electronic device 100. While charging the battery 142, the charging management module 140 can also supply power to the electronic device 100 via the power management module 141.

[0104] The power management module 141 connects the battery 142, the charging management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and / or the charging management module 140, providing power to the processor 110, internal memory 121, display screen 194, camera 193, and wireless communication module 160, etc. The power management module 141 can also monitor parameters such as battery capacity, battery cycle count, and battery health status (leakage current, impedance). In some other embodiments, the power management module 141 may also be located within the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be located in the same device.

[0105] The wireless communication function of electronic device 100 can be realized through antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor and baseband processor, etc.

[0106] Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover one or more communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna for a wireless local area network. In some other embodiments, the antennas can be used in conjunction with tuning switches.

[0107] The mobile communication module 150 can provide solutions for wireless communication, including 2G / 3G / 4G / 5G, applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves via antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves before transmitting them to a modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves for radiation via antenna 1. In some embodiments, at least some functional modules of the mobile communication module 150 may be housed in the processor 110. In some embodiments, at least some functional modules of the mobile communication module 150 and at least some modules of the processor 110 may be housed in the same device.

[0108] The modem processor may include a modulator and a demodulator. The modulator modulates the low-frequency baseband signal to be transmitted into a mid-to-high frequency signal. The demodulator demodulates the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After processing by the baseband processor, the low-frequency baseband signal is transmitted to the application processor. The application processor outputs sound signals through an audio device (not limited to speaker 170A, receiver 170B, etc.) or displays images or videos through the display screen 194. In some embodiments, the modem processor may be a separate device. In other embodiments, the modem processor may be independent of the processor 110 and may be housed in the same device as the mobile communication module 150 or other functional modules.

[0109] The wireless communication module 160 can provide solutions for wireless communication applications on the electronic device 100, including wireless local area networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared (IR) technologies. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via antenna 2, performs frequency modulation and filtering of the electromagnetic wave signals, and sends the processed signal to processor 110. The wireless communication module 160 can also receive signals to be transmitted from processor 110, perform frequency modulation and amplification, and convert them into electromagnetic waves for radiation via antenna 2.

[0110] In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150, and antenna 2 is coupled to wireless communication module 160, enabling electronic device 100 to communicate with networks and other devices via wireless communication technology. The wireless communication technology may include Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-CDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and / or IR technologies, etc. The GNSS may include the Global Positioning System (GPS), the Global Navigation Satellite System (GLONASS), the BeiDou Navigation Satellite System (BDS), the Quasi-Zenith Satellite System (QZSS), and / or satellite-based augmentation systems (SBAS). In other words, the electronic device 100 has positioning and wireless communication capabilities.

[0111] Electronic device 100 implements display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering. Processor 110 may include one or more GPUs, which execute program instructions to generate or modify display information.

[0112] Display screen 194 is used to display images, videos, etc. Display screen 194 includes a display panel. The display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini LED, a MicroLED, a Micro-OLED, a quantum dot light-emitting diode (QLED), etc. In some embodiments, electronic device 100 may include one or N displays 194, where N is a positive integer greater than 1.

[0113] Electronic device 100 can perform shooting functions through ISP, camera 193, video codec, GPU, display 194 and application processor.

[0114] The ISP (Image Signal Processor) is used to process data fed back from the camera 193. For example, when taking a picture, the shutter is opened, and light is transmitted through the lens to the camera's photosensitive element. The light signal is converted into an electrical signal, and the camera's photosensitive element transmits the electrical signal to the ISP for processing, transforming it into an image visible to the naked eye. The ISP can also perform algorithmic optimization of image noise, brightness, and skin tone. The ISP can also optimize parameters such as exposure and color temperature of the shooting scene. In some embodiments, the ISP can be set in the camera 193.

[0115] Camera 193 is used to capture still images or videos. An object is projected onto a photosensitive element by generating an optical image through the lens. The photosensitive element can be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the light signal into an electrical signal, which is then passed to an ISP for conversion into a digital image signal. The ISP outputs the digital image signal to a DSP for processing. The DSP converts the digital image signal into image signals in standard RGB, YUV, or other formats. In some embodiments, the electronic device 100 may include one or N cameras 193, where N is a positive integer greater than 1.

[0116] Digital signal processors (DSPs) are used to process digital signals. Besides digital image signals, they can also process other digital signals. For example, when electronic device 100 selects a frequency, the DSP can perform Fourier transforms on the frequency energy.

[0117] Video codecs are used to compress or decompress digital video. Electronic device 100 may support one or more video codecs. Thus, electronic device 100 can play or record videos in various encoding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

[0118] An NPU (Neural Processing Unit) is a neural network computing processor that, by drawing inspiration from the structure of biological neural networks, such as the transmission patterns between neurons in the human brain, rapidly processes input information and can continuously learn on its own. NPUs enable intelligent cognitive applications in electronic devices, such as image recognition, facial recognition, speech recognition, and text understanding.

[0119] The external storage interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external storage interface 120 to perform data storage functions. For example, music, video, and other files can be saved on the external memory card.

[0120] Internal memory 121 can be used to store computer executable program code, which includes instructions. Internal memory 121 may include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback, image playback, etc.), etc. The data storage area may store data created during the use of electronic device 100 (such as audio data, phonebook, etc.). Furthermore, internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc. Processor 110 executes various functional applications and data processing of electronic device 100 by running instructions stored in internal memory 121 and / or instructions stored in memory located in the processor.

[0121] Electronic device 100 can implement audio functions, such as music playback and recording, through audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, and application processor.

[0122] Buttons 190 include a power button, volume buttons, etc. Buttons 190 can be mechanical buttons or touch-sensitive buttons. Electronic device 100 can receive button input and generate key signal inputs related to user settings and function control of electronic device 100.

[0123] Motor 191 can generate vibration alerts. Motor 191 can be used for incoming call vibration alerts or for touch vibration feedback. For example, different vibration feedback effects can correspond to touch operations performed on different applications (such as taking photos, playing audio, etc.). Motor 191 can also correspond to different vibration feedback effects for touch operations performed on different areas of the display screen 194. Different application scenarios (such as time reminders, receiving messages, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.

[0124] Indicator 192 can be an indicator light, used to indicate charging status, power changes, or to indicate messages, missed calls, notifications, etc.

[0125] The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to make contact with and separate from the electronic device 100. The electronic device 100 can support one or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple cards can be inserted into the same SIM card interface 195 simultaneously. The multiple cards can be of the same or different types. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as calls and data communication. In some embodiments, the electronic device 100 uses an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

[0126] The electronic device proposed in the embodiments of this application can also be a server, specifically a standalone server or a server cluster. For example, Figure 2 This is a schematic diagram of the server structure in one embodiment of this application. Figure 2 As shown, server 200 may include: one or more processors 210, communication interface 220, memory 230, and communication bus 240 connecting different components (including memory 230, communication interface 220 and processor 210).

[0127] Communication bus 240 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, or a local bus using any of the various bus architectures. For example, communication bus 240 may include, but is not limited to, an industry standard architecture (ISA) bus, a micro channel architecture (MCA) bus, an enhanced ISA bus, a video electronics standards association (VESA) local bus, and a peripheral component interconnection (PCI) bus.

[0128] Electronic devices typically include a variety of computer-readable media. These media can be any available media that can be accessed by the electronic device, including volatile and non-volatile media, and removable and non-removable media.

[0129] The memory 230 may include a computer system readable medium in the form of volatile memory, such as random access memory (RAM) and / or cache memory. The memory 230 may include at least one program product having a set (e.g., at least one) of program modules configured to execute the computing resource scheduling method provided in the embodiments of this application.

[0130] A program / utility having a set (at least one) of program modules can be stored in memory 230. Such program modules include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. The program modules typically perform the functions and / or methods described in the embodiments of this application.

[0131] The processor 210 executes various functional applications and data processing by running programs stored in the memory 230, such as implementing the computing resource scheduling method provided in the embodiments of this application.

[0132] It should be understood that Figure 2 The processor 210 in the server 200 shown can be a system-on-a-chip (SoC). The processor 210 may include a central processing unit (CPU) and may further include other types of processors, such as a graphics processing unit (GPU).

[0133] For example, the computing resource scheduling method provided in this application embodiment can be based on Figure 3 The system architecture implementation is shown.

[0134] like Figure 3 As shown, this system architecture may include a cloud server 301 deployed on the server side, an image repository 302, and terminal devices 303 and 3032 that communicate remotely with the cloud server 301. Terminal device 3031 or terminal device 3032 can be a smartphone, desktop computer, tablet computer, etc. In this system architecture example, at the hardware level, the cloud server 301 may include a server cluster, such as a highly available (HA) cluster, a load balancing (LB) cluster, or a high-performance computing (HPC) cluster.

[0135] Cloud service platforms that provide cloud server rental services pre-release base images (or parent images) with master-slave function interaction modules, or release software development kits (SDKs) with master-slave function interaction capabilities. A base image with a master-slave function interaction module is a container image obtained by integrating the master-slave function interaction module into an operating system image.

[0136] The master-slave function interaction module, also known as the master-slave function interaction module, is used to enable interaction between the master and slave functions, or between master function container instances and slave function container instances. This module can be a software module or a combination of software and hardware. An SDK with master-slave function interaction capabilities refers to an SDK that enables interaction between the master and slave functions. Specifically, this interaction can involve establishing connections between slave functions and transmitting various data, such as task data and computation results, based on these connections.

[0137] Terminal device 3031 receives a custom container image obtained based on instructions from AI service developer U1 (first user).

[0138] Custom container images can be obtained in two ways. One way is to integrate the trained AI model into a cloud service application after obtaining the AI ​​model, and then install the cloud service application with the integrated AI model into the base image published by the cloud service platform, thus obtaining a custom container image. The other way is to integrate the trained AI model into the cloud service application, and also integrate an SDK with master-slave function interaction capabilities into the cloud service application. Finally, install the cloud service application with both the AI ​​model and the SDK into the base operating system image, thus obtaining a custom container image.

[0139] The AI ​​model can be any model used to provide AI services to users. For example, it can be a composite model obtained by integrating one or more of the following models: natural language processing model, computer vision model, speech recognition and generation model, reinforcement learning model, multimodal model, etc.

[0140] According to the upload instruction from the first user U1, the terminal device 3031 uploads the received custom container image to the image repository 303.

[0141] Terminal device 3031 receives the resource specification information of the main function and slave function selected by the first user U1 and uploads it to cloud server 301.

[0142] At the software level, a function management module is deployed in cloud server 301. The function management module is used to start a main function container instance and at least one slave function container instance based on a custom container image and according to the resource specification information selected by the first user U1, and to trigger the connection between the main function container instance and the slave function container instance.

[0143] After a successful connection, in response to the AI ​​service user U2 (the second user)'s call request for the cloud service program, the main function container instance decomposes the computational tasks involved in the inference task corresponding to the call request, divides the computational tasks into multiple parts, and distributes them to the hardware devices corresponding to the main function container instance and at least one slave function container instance to execute their respective computational tasks. The hardware devices corresponding to the main function container instance and at least one slave function container instance can perform parallel computations. Afterwards, the main function container instance summarizes the computational results obtained by the main function container instance and the slave function instance respectively, and sends the inference result obtained based on the summarized computational results to the terminal device 3032.

[0144] The operator resource scheduling method provided in this application can be applied to various application scenarios involving AI services, such as manufacturing, smart homes, transportation, and social communication. Specific AI services can include image processing, natural language processing, video generation, visual question answering, and multimodal data processing. For example, image processing can include AI image retouching and AI image generation services, while video generation can include text-to-video generation services.

[0145] To facilitate understanding, the following will be combined with Figure 3 The system architecture example shown illustrates the computing resource scheduling method provided in this embodiment of the application through specific embodiments.

[0146] like Figure 4 As shown, the computing resource scheduling method proposed in this application embodiment can include three parts: (i) generating a custom container image; (ii) creating and deploying a main function and a slave function; and (iii) the main function and the slave function perform parallel computation and return the inference result.

[0147] In this embodiment, taking the provided AI service as an example of an AI image retouching application scenario, user U2 (the AI ​​service user) uploads a photo through an app installed on terminal device 3032 (e.g., a mobile phone), hoping to receive a retouched image. Terminal device 3032 sends the photo uploaded by user U2 to cloud server 301. Cloud server 301 utilizes the computing resources corresponding to the master-slave function pre-deployed by user U2 to perform parallel computations involved in the image retouching task. After summarizing the computation results of the master-slave function, it returns a retouched image to terminal device 3032 for display.

[0148] Specifically, such as Figure 5 As shown, the computing resource scheduling method in this embodiment may include all or part of the following steps:

[0149] S10: Terminal device 3031 obtains a custom container image with a master-slave function interaction module.

[0150] In AI-powered image editing applications, a trained AI image editing model can be identified first. This embodiment uses the deployment of an AI model in the provided AI service as an example for illustration. In other embodiments, other tools can be used to provide AI services, and are not limited to AI models.

[0151] A custom container image with a master-slave function interaction module can be obtained using one of the following two methods:

[0152] Method 1:

[0153] like Figure 6aAs shown, cloud service platforms providing cloud server rental services can publish base images with master-slave function interaction modules at the front end. These base images, also known as parent images, are container images that integrate or install the master-slave function interaction module within an operating system image. The operating system image can be based on a Linux operating system image, such as Debian, Ubuntu, CentOS, Fedora, etc.

[0154] The terminal device detects the download instruction from user U1 (AI service developer) for the aforementioned parent image and sends a download request (or acquisition request) to the cloud service platform. In response to the terminal device's download request, the cloud service platform distributes the aforementioned parent image to the terminal device. After receiving the aforementioned parent image, the terminal device receives the user's installation instruction to install the cloud service program into the parent image, and installs the user-specified cloud service program into the parent image, thus obtaining a custom container image.

[0155] The cloud service program specified by the user is a cloud service program that integrates an AI model. For example, in this embodiment, the AI ​​model is a trained model with AI image editing capabilities. The terminal device receives the instruction from user U1 to integrate the trained AI model into the cloud service program, and generates a cloud service program that integrates the AI ​​model.

[0156] In other words, in Method 1, the custom container image is a container image obtained by installing a cloud service program that integrates an AI model, using a container image provided by the cloud service platform that includes a master-slave function interaction module as the base image. The container image provided by the cloud service platform, which includes a master-slave function interaction module as the base image, is a container image that integrates a module with master-slave function interaction capabilities on top of an operating system image.

[0157] The order in which the steps of obtaining the base image (parent image) and integrating the AI ​​model into the cloud service program are not limited. In Method 1, the terminal device can be terminal device 3031 or other electronic devices on the user U1 side.

[0158] Method 2:

[0159] like Figure 6b As shown, cloud service platforms can publish SDKs with master-slave function interaction capabilities on the front end.

[0160] The terminal device detects the download instruction of user U1 (AI service developer) for the above SDK and sends a download request (or acquisition request) to the cloud service platform. In response to the download request of the terminal device, the cloud service platform distributes the above SDK to the terminal device.

[0161] The terminal device receives an instruction from user U1 to integrate the SDK into the cloud service program, generating a cloud service program with an integrated master-slave function interaction module. Next, the terminal device receives an instruction from user U1 to integrate a trained AI model into the aforementioned cloud server with the integrated master-slave function interaction module, generating a cloud service program with both the master-slave function interaction module and the AI ​​model. For example, the AI ​​model could be a model with AI image editing capabilities.

[0162] There is no requirement for the execution order of the two steps: integrating the AI ​​model and integrating the SDK.

[0163] For example, the terminal device could receive an instruction from user U1 to integrate an AI model into a cloud service program, and first generate a cloud service program that integrates the AI ​​model. Next, the terminal device would receive an instruction from user U1 to integrate the SDK into the aforementioned cloud server that integrates the AI ​​model, generating a cloud service program that integrates the AI ​​model and has a master-slave function interaction module.

[0164] After obtaining the cloud service program that integrates the AI ​​model and has a master-slave function interaction module, the cloud service program with the integrated AI model and master-slave function interaction module is installed on the operating system image specified by user U1, resulting in a custom container image. Here, the operating system image specified by the user can be a Linux-based operating system image.

[0165] The terminal device in Method 2 can be terminal device 3031 or other electronic devices on the user U1 side.

[0166] S11: Terminal device 3031 uploads a custom container image to the image repository.

[0167] In this embodiment, the terminal device 3031 may upload a custom container image to an image repository in response to an upload command issued by user U1. For example, in the image management interface, if the user's upload command is detected (e.g., clicking on "Upload" or a control indicating upload, or performing other specified actions), the terminal device 3031 can push the custom container image to the custom image repository of the cloud service platform, and user U1 can subsequently use this custom image when creating functions.

[0168] S12: The image repository returns the image address of the custom container to the terminal device 3031.

[0169] The image address can be the storage address of a custom container image in an image repository, or it can be an index that makes it easy to quickly find the corresponding custom container image.

[0170] After user U1 uploads a custom container image to the image repository via terminal device 3031, the image repository will return an image address to terminal device 3031. Optionally, the image address can be displayed on terminal device 3031, allowing user U1 to know the image address of the custom container image successfully uploaded to the image repository. In subsequent processes, user U1 can quickly find the previously uploaded custom container image by entering or selecting this image address.

[0171] S13: In response to the function creation command issued by user U1, display the function management interface.

[0172] like Figure 4 As shown, for example, in a function management interface example, main function configuration information and slave function configuration information can be displayed.

[0173] The main function configuration information may specifically include runtime environment information, container image information, and resource specification information corresponding to the main function. The resource specification information corresponding to the main function is generally GPU / NPU specification information. The runtime environment is used for user input or selection of the image type, for example, the image type is a custom container (i.e., a custom container image). In this embodiment, since it needs to run in the custom container image environment uploaded by the user U2, the image type needs to be set to a custom container. The GPU / NPU specification information corresponding to the main function can have multiple candidate options, which the user can select from, for example... Figure 4 The resource specifications corresponding to the main function selected by the user are 1*XXX / 1*32GB.

[0174] The slave function configuration information can specifically include the number of slave functions and the resource specifications for each individual slave function. The resource specifications for a single slave function are typically GPU / NPU specifications. For example, Figure 4 The resource specification information corresponding to a single slave function selected by the user is 1*XXX / 1*32GB.

[0175] The main function corresponds to the resources of a single physical machine, with a maximum resource limit of 8 physical cards. The slave function can correspond to the resources of a single physical machine or a virtual machine. The resource specifications for the main and slave functions can be the same or different, and users can also choose other resource specifications from multiple candidate options, not limited to... Figure 5 The example shown.

[0176] Figure 4 The function management interface shown is only a schematic diagram of one example interface. In actual applications, the function management interface can be one or more of various forms of interfaces that can display all or part of the above information.

[0177] In some embodiments, the following can also be used: Figure 7 The example shown is a function management interface. In response to user U1's function creation command, terminal device 3031 displays the following: Figure 7 The function management interface I1 shown is such that, for example, when a user clicks the "Manage Functions" control in the previous interface, the terminal device 3031 detects the user's click action, generates a function creation instruction, and displays interface I1.

[0178] Figure 7 The interface I1 shown includes the main function configuration area I11. Within the main function configuration area, there are specifically displayed areas I111 for setting the resource specifications corresponding to the NPU, I112 for setting the resource specifications corresponding to the GPU, and controls I113 for setting the memory and memory specifications corresponding to the main function.

[0179] The I111 area displays controls for setting whether to disable NPUs. The default is to use an NPU accelerator card. If the user does not select the option to disable NPUs, controls for setting the number of NPUs and the computing power of a single NPU will be displayed in I111. In this example interface, the computing power of a single NPU has two specifications: 280T or 376T. For example, as shown in the figure, user U1 selected a computing power of 376T for a single NPU. T stands for Terabyte. Optionally, if the user selects not to use NPUs, the controls for setting the computing power of a single NPU will not be displayed. The number of NPUs can be set by the user; generally, the number of NPUs can be selected from 1 to 8.

[0180] like Figure 7 As shown, in area I112, if the user selects not to use the GPU, the controls for setting the computing power of a single GPU will not be displayed. In other cases, such as when the user selects to use the GPU (i.e., the user has not selected the "Do not use GPU" control), area I112 can display controls for setting whether to use the GPU, controls for setting the number of GPUs, and controls for setting the computing power of a single GPU. The computing power of a single GPU has two specifications: 280T or 376T.

[0181] Interface I1 also displays control I12. Control I12 is used to set whether the function functionality is enabled. Figure 7 The sub-instance shown is the slave function. When user U1 slides the button, the terminal device detects the sliding operation of user U1, activates the slave function, and displays area I21 in interface I1 for setting the slave function resource specifications.

[0182] In area I21, two sub-areas are currently displayed for setting the resource specifications of a single slave function: sub-area I211 and sub-area I212, indicating that the user has currently set two slave functions. Area I21 also displays a control I213 for adding new sub-instances (slave functions). For example, if user U1 clicks the "Add Sub-Instance" control I213, the terminal device detects the click action of user U1 on control I213 and adds a sub-area for configuring the resource specifications of a slave function.

[0183] Sub-region I211 or sub-region I212 can display controls for setting the number of cards corresponding to a single slave function, controls for setting the computing power of a single NPU accelerator card, and controls for setting the CPU / memory specifications corresponding to a single slave function.

[0184] For example, Figure 7 In the interface shown, the number of cards corresponding to a function is 8, the computing power of a single NPU accelerator card is 376T (or 280T), and the CPU / memory specifications are 0.3 cores / 128M.

[0185] Optional, in Figure 7 Based on the interface example shown, when the cursor moves to the display area corresponding to control I12, a text prompt message can be displayed: "Starting a sub-instance enables cross-machine scheduling and improves performance." This prompts the user to enable sub-instances, or informs the user of the functional effects of enabling sub-instances after they have been enabled.

[0186] Figure 4 or Figure 7 The interface shown is for illustrative purposes only. The function management interface can also be laid out and displayed in other ways, and is not limited to the example above.

[0187] Specifically, in the application scenario of this embodiment, because AI image editing has high computational performance requirements, 10 GPU cards are needed to meet the user's requirements for inference latency. For example, the user can use the configuration parameters shown in Table 1 below for the resource specifications of the master and slave functions:

[0188] Table 1

[0189]

[0190]

[0191] S14: Terminal device 3031 receives the resource specification information corresponding to the main function selected by user U1 and the resource specification information corresponding to at least one slave function.

[0192] S15: Terminal device 3031 uploads master-slave function resource specification information to the server-side function management module in cloud server 301.

[0193] For example, user U1 in such Figure 4 or Figure 7 After creating a function in the interface shown, click "Create and Deploy". The terminal device 3031 will transmit the resource specification information of the main function and the resource specification information (or parameters) of the slave function submitted by user U1 to the server-side function management module (hereinafter referred to as the function management module) of the cloud server 301.

[0194] S16: Terminal device 3031 uploads the image address selected by user U1 to the server-side function management module in cloud server 301.

[0195] In this embodiment, a region for users to set the runtime environment can also be set in the function management interface, or a separate interface can be set up for configuring the runtime environment. For example, Figure 4 As shown, user U2 can set the runtime environment type to a custom container in this interface, and enter or select the image address of a previously uploaded custom container image in the input box corresponding to the container image. Terminal device 3031 uploads the image address selected by the user to the function management module on the server side.

[0196] S17: Obtain the corresponding custom container image based on the image address, and obtain the function container image.

[0197] You can obtain a custom container image based on the image address, or you can download it from an image repository.

[0198] It should be noted that in this embodiment, the container image corresponding to the main function is called a custom container image. In other words, the aforementioned custom container image can be understood as the image used to start (or launch) the main function container. The container image corresponding to the slave function is pre-deployed by the cloud service platform and does not require user intervention. The cloud service platform can pre-generate multiple container images corresponding to slave functions. The slave function container image is the container image corresponding to the slave function, or in other words, the image used to start the slave function container instance.

[0199] S18: Based on a custom container image, start a main function container instance that meets the main function resource specifications.

[0200] Specifically, the server-side function management module receives various resource specification parameters configured through the function management interface. In this embodiment, the resource type specified by the main function is "GPU", and the resource specification is configured as "8*NVIDIA Tesla V100", that is, 8 GPU cards (physical cards), and the specification (model) of each GPU card is NVIDIA Tesla V100.

[0201] The server-side function management module can be used to maintain information on the available computing resources of physical cards or virtual machines in the server cluster. Based on the resource specification parameters (i.e., resource specification information) of the main function submitted by user U2, the function management module searches for computing resources that meet the specification requirements from the available computing resources. These computing resources include the computing resources corresponding to the NPU and GPU, as well as resources such as CPU and memory. For example, in this embodiment, for the main function, the computing resources prepared by the function management module are a virtual machine with a Linux operating system already installed. The total computing resources of this virtual machine can be greater than or equal to the resource specifications of the main function.

[0202] User U1 selects either "Custom Container" or "Custom Image" as the runtime environment type (or function image type) and specifies the specific image using the image address parameter. The server-side function management module then pulls the specific image pointed to by that image address onto the prepared virtual machine. Pulling can be understood as downloading or copying an image from the image repository to the prepared virtual machine.

[0203] In this embodiment, for example, the Docker tool can be used to start a container instance based on a container image. A container instance can be simply referred to as a container or instance.

[0204] For the main function, a container instance is started based on a custom container image.

[0205] It should be noted that the image used to start the main function container instance in this embodiment is built based on a base image containing a master-slave function interaction module. Alternatively, the cloud service platform may not provide a base image but instead provide an SDK. The SDK with master-slave function functionality can be integrated into the cloud service program on the electronic device on the user U1 side. For specific methods of obtaining a custom container image, please refer to Method 1 or Method 2 in S10 above, which will not be repeated here.

[0206] S19: Launch a slave function container instance that conforms to the slave function resource specification based on the slave function container image.

[0207] As shown in the interface example above, users can choose whether to enable slave functions. When the server-side function management module recognizes that the value of the "Enable slave functions" field in the uploaded resource specification parameters is "Yes", the server-side function management module continues to read the resource specification information of the slave functions.

[0208] It should be noted that the "function type" (i.e., resource type) of the slave function must be consistent with that of the main function. For example, as shown in Table 1 above, if the function type of the main function is GPU, then the function type of the slave function must also be "GPU". Alternatively, if the "function type" of the main function is NPU, then the function type of the slave function must also be NPU. Referring again to Table 1 above, in this embodiment, the "function specification" of the slave function is configured as "1*NVIDIA Tesla V100". Furthermore, the slave function can be configured with a "quantity" parameter; in this embodiment, the quantity is configured as "2". Accordingly, the server-side function management module in this embodiment can prepare Linux virtual machines of the corresponding specifications. It should be noted that necessary resource information such as CPU and memory parameters also needs to be configured; users can choose the specifics themselves, and these will not be elaborated upon here.

[0209] Furthermore, the number of virtual machines does not necessarily need to match the number of slave function containers. That is, the number of virtual machines can be the same as or different from the number of slave functions. It is sufficient to ensure that the total resource allocation of the virtual machines can be divided according to the resource specifications of the corresponding slave function containers. For example, in this embodiment, the total resource specifications required by the two slave functions are 2*NVIDIA Tesla V100. The function management module can allocate one virtual machine containing 2*NVIDIA Tesla V100 for each slave function, or it can allocate two virtual machines, each containing 1*NVIDIA Tesla V100.

[0210] As mentioned earlier, the container image for the secondary function can be provided by a cloud service platform, and user U1 only needs to select the version compatible with the main function. In other words, the code of the main function and the secondary function may be updated in actual application. After the update, the versions of the main function and the secondary function should remain compatible; otherwise, they may not be compatible. For example, in this embodiment, user U1 selects "V1.0" as the "secondary function version." At this time, the server-side function management module will pull the corresponding version of the secondary function container image to the prepared virtual machine based on the secondary function version. Pulling can be understood as downloading from the image repository. Then, two secondary function container instances are launched or started or run in the form of containers.

[0211] The two slave function container instances launched by the server-side function management module can share computing power with the main function in subsequent processes, helping the main function overcome computing power bottlenecks.

[0212] S20: Establish a connection between the main function container and the subordinate function container.

[0213] In S19, the server-side function management module starts one main function container instance and two slave function container instances. In S20, the server-side function management module stores the network information of each container in the topology network composed of the main function container instance and the slave function container instances. For example, the function management module stores at least the network information of each container instance in the local topology network composed of the aforementioned one main function container instance and two slave function container instances. The function management module injects the network information of the two slave function container instances into the main function container instance. This injection can be done by transferring environment variables or files containing network information to the main function container instance. Based on the injected network information, the main function container instance attempts to establish connections with the two slave function container instances.

[0214] Network information can be the necessary information obtained by the main function when initiating a connection request to the slave function. For example, network information should at least include the port information and IP address information corresponding to the slave function's container instance. The port can be understood as the interface through which the slave function receives the main function's call request; port information can include the port number. The IP address information should at least include the container's IP address.

[0215] For example, in some embodiments, network information includes, but is not limited to, one or more of the following: node IP information, container IP information, open port numbers, authentication information, master-slave function topology connection information, etc.

[0216] Authentication information refers to one or more types of information used for identity authentication or verification between the main function and the slave function. For example, authentication information can be a certificate or ciphertext to be decrypted after key processing.

[0217] The master-slave function topology connection information may include the number of slave functions that the master function needs to connect to, or it may include NPU / GPU connection method information. NPU / GPU connection method information refers to the connection methods that the NPU / GPU in the slave function can support, such as whether it can connect only to the same master function or connect to multiple master functions. It should be noted that in some embodiments, the network information may not include master-slave function topology connection information.

[0218] Here, a node is a pod.

[0219] like Figure 8As shown, the main function container instance or the slave function container instance can run in the form of a pod. A pod includes one main container and multiple auxiliary containers. For ease of distinction, the main container corresponding to the main function is defined as the first container, and the main container corresponding to the slave function is defined as the second container. Connection establishment can be understood as the establishment of a connection between the first container and the second container. The function management module injects the network information of the slave function container instance into the first container. Based on the network information, the first container initiates a connection request to the second container. Before initiating the connection request, the first container needs to obtain at least the container IP address information of the second container, the IP address information of the pod corresponding to the second container, and the port number information of the call port (port) opened by the slave function. After obtaining at least the above IP information and port information from the network information injected by the function management module, the first container initiates a connection request to the container instance corresponding to the above information. In other words, the network information injected into the first container by the function management module may include, but is not limited to, the IP address information of the second container, the IP address information of the pod corresponding to the second container, and the port number (ID) information of the port of the slave function container instance used by the main function for call.

[0220] It should be noted that the connection can be initiated by the main function container instance, specifically by the master-slave function interaction module within the main function container instance.

[0221] like Figure 9 As shown, corresponding to Method 1 and Method 2 for obtaining the custom container image in step S10, the area below the dashed line represents the custom container image obtained using Method 1 (i.e., ...). Figure 9 The diagram above the dashed line shows a schematic of the main function container image, and the diagram above the dashed line shows a custom container image obtained using method two.

[0222] In Method 1, the master-slave function interaction module is essentially deployed within the operating system. During connection establishment, the master-slave function interaction module initiates a connection request to the slave function container instance. In Method 2, this is equivalent to integrating an SDK with master-slave function interaction functionality into the cloud service application. During connection establishment, the master-slave function interaction module obtained from the SDK initiates a connection request to the slave function application. Both the cloud service application and the slave function application can utilize the underlying hardware resources to execute their respective computing tasks.

[0223] In this embodiment, one main function container instance attempts to establish a connection with two slave function container instances. If the connection with both slave functions is successful, the connection is complete. Otherwise, the main function container instance will attempt to establish a connection with the two slave functions again until the connection with all slave functions is successful.

[0224] For example, such as Figure 10As shown, establishing a connection between a main function container instance and a secondary function container instance may include some or all of the following processes:

[0225] S201: The server-side function management module records and maintains the networking information of at least one slave function corresponding to the main function.

[0226] S202: The server-side function management module injects at least one network information of a slave function container instance into the master-slave function interaction module in the master function container instance.

[0227] S203: The master-slave function interaction module obtains the configuration information corresponding to the slave function container instance by calling the interface of at least one slave function based on the network topology information.

[0228] For example, the network information should at least include the IP address and port number corresponding to the slave function container instance. The master-slave function interaction module in the master function container instance calls the slave function's interface based on the IP address and port number to obtain the slave function's configuration information.

[0229] S203: In response to a call request from the main function container instance, obtain the configuration information of this device from the function container instance.

[0230] It should be noted that when the server-side function management module launches the main function container instance and the slave function instance, it will launch the corresponding main function container instance and slave function container instance according to the resource specifications configured by the user in the front-end interface. When the server-side function management module injects network information into the master-slave function interaction module, it can only pass the necessary information for the master function and slave function to initiate the connection request (such as the port information and IP address information of the slave function). However, the master-slave function interaction module does not know the actual resource specifications of the slave function container instance to be connected, so it needs to obtain the configuration information from the slave function container instance.

[0231] The configuration information may include information such as the resource specifications of the hardware device corresponding to the function container instance and the software version of the function program.

[0232] S204: Return the configuration information of this device from the function container instance to the master-slave function interaction module.

[0233] The configuration information includes the resource specifications of the hardware device corresponding to the function and the version information of the function program. For example, the version information of the function program in Table 1 is V1.0.

[0234] S205: Return configuration information from the function container instance to the main function container instance.

[0235] S206: If the resource specifications in the configuration information of the main function container instance meet the resource specifications requirements of the slave function configured by the user, the main function container instance will issue a certificate to the slave function container instance.

[0236] For example, if the resource specifications in the configuration information meet the resource specification requirements of the slave functions in Table 1 above, that is, the resource specifications corresponding to a single slave function are: Graphics card: 1*NVIDIA Tesla V100; Video memory: 1*32GB, then a certificate will be issued. The issued certificate can be a certificate to be verified with the root certificate signature.

[0237] S207: Verify the certificate issued by the main function container instance using the root certificate held by the function container instance.

[0238] If the verification passes, the connection is established, or in other words, the connection is successfully established. After the connection is established, an asymmetric encryption algorithm can be used to perform a security verification between the main function and the slave function. If the verification passes, a symmetric key can be used to encrypt the data to be transmitted to avoid the impact of asymmetric encryption algorithms on data transmission performance.

[0239] Once the connection is established, data can be transmitted between the main function container instance and the slave function container instance based on the SSL protocol.

[0240] S21: After the connection is established, the main function container sends a notification message to the terminal device 3031.

[0241] Once the main function container instance and the slave function container instance are connected, the master-slave function interaction module can send a notification to the terminal device 3031 on the developer's side (i.e., user U1), informing the developer that the cloud service program is ready and the cloud service program developed by the developer can start providing AI services to the outside world.

[0242] S22: In response to the user U2's operation, the terminal device 3032 initiates a call request to the main function container in the cloud server 301 for the cloud service program that integrates the AI ​​image editing model.

[0243] For example, user U2 uses terminal device 3032. User U2 selects a photo to be edited using the mobile app provided by the developer (user U1). The app uploads the image to be edited provided by user U2 to the cloud server and calls the API corresponding to the main function container instance in the cloud server. In other words, by calling the API, user U2 can use the AI ​​image editing service provided by the cloud service program.

[0244] S23: The main function container decomposes the computational tasks involved in AI image retouching and distributes the decomposed tasks to the main function container and at least one slave function container.

[0245] The master-slave function interaction module in the master function container instance decomposes the AI ​​image editing task into multiple parts, distributes them to the master function and at least one slave function, and performs the computation in parallel.

[0246] Task decomposition can be based on the proportion of resource specifications corresponding to the main function and slave functions. This can be calculated as follows: the proportion of the total task allocated to the main function = main function computing power / (main function computing power + slave function computing power); and the proportion of the total task allocated to a single slave function = slave function computing power / (main function computing power + slave function computing power).

[0247] For example, in this embodiment, as shown in Table 1 above, the resource specifications corresponding to one main function are:

[0248] Graphics cards: 8*NVIDIA Tesla V100; Video memory: 8*32GB;

[0249] The resource specifications for each of the two slave functions are as follows:

[0250] Graphics card: 1*NVIDIA Tesla V100; Video memory: 1*32GB;

[0251] Therefore, the graphics card, i.e., the GPU, has a main function task weight of 8 / (8+2*1). This task can be divided into 10 parts: 8 parts distributed to the main function container instance, and 2 parts distributed to the two slave function container instances. In this way, the main function container instance and the slave function container instances can perform parallel computation, with the computation time approximately equal to the computation time of one part of the task.

[0252] In this embodiment, taking AI image editing as an example, the AI ​​image editing process involves a large amount of graphics computation. Therefore, the task to be decomposed can include graphics computation tasks. In other embodiments, the task can include different computational tasks such as general computation and neural network computation. For example, general computation can be one or more of video encoding / decoding, matrix computation, etc. Neural network computation can be computation between neurons in a neural network or operations within or between different network layers. For example, neural network computation can include convolution operations, attention mechanism operations, etc.

[0253] S24: The main function container receives at least one computation result returned from the function container.

[0254] S25: The main function container combines the computation results of the main function and at least one slave function.

[0255] S26: The main function container returns the inference result to the terminal device 3032.

[0256] After the calculation is completed, the master-slave function interaction module in the master function container instance will merge the calculation result of this device with the calculation result of at least one slave function, and return the output result of the AI ​​model to the terminal device 3032.

[0257] The inference result can be obtained by the AI ​​model based on the merged calculation results. The calculation result can be the calculation result of the intermediate layer in the AI ​​model's inference process, or the calculation result of the output layer. The inference result can be obtained directly or indirectly from the calculation result.

[0258] For example, the reasoning result could be a photo processed by an AI model, which would then be returned to the terminal device 3032. User U2 could then view the refined photo on the client (App) corresponding to the cloud service program with AI photo retouching capabilities.

[0259] according to Figure 5 Based on the embodiments shown and the above description, many other embodiments can be obtained. These will not be listed one by one in this application specification.

[0260] After using a custom container image containing a master-slave function interaction module, user U1 can choose to enable the slave function function and specify the resource specifications of the slave function when creating the function. After deployment, both the master function container instance and the slave function container instance can be launched simultaneously. The slave function container instance can share computing power with the master function container instance, thereby expanding the GPU / NPU computing power of the master function. As a result, the overall computing power obtained by the cloud service program is equal to the computing power of the master function plus the computing power of the slave function. The computing power of the master function can be the computing power of a single physical machine. Thus, based on the master-slave function collaborative parallel computing scheduling mechanism proposed in this application embodiment, the computing power that the cloud service program used to provide AI services can rent will no longer be a bottleneck, and will no longer be limited by the limitations of insufficient computing power of a single card or a maximum of 8 cards per physical machine.

[0261] Furthermore, the number and specifications of slave functions can be flexibly configured as needed, and computing power can be shared. The more slave functions configured and the greater the computing power, the stronger the computing power of the main function will be.

[0262] In the method proposed in this application embodiment, when creating a function, the developer (user U1) can choose to enable or disable the slave function feature. When enabling the slave function feature, the resource specifications and quantity of the slave functions can be configured. After the resource specification information of the slave functions is synchronized to the server, the server will also launch the corresponding slave function container instance when launching the main function container instance. The slave functions will share computing power with the main function.

[0263] In addition, when enabling the slave function function, user U1 only needs to build its own image based on the base image provided on the service, or integrate the SDK with master-slave function function provided by the cloud service platform into the cloud service program.

[0264] The cloud service program does not need to be aware of the connection and calculation process of the master-slave function during operation, so it can be done without the user's awareness and avoid the problem of user experience being affected by changes in resource specifications.

[0265] As mentioned above, some cloud service platforms do not support dynamic modification of resource specifications. When the resource specifications that user U1 needs to rent change, the creation function needs to be redeployed or recreated. The computing resource scheduling method proposed in this application supports dynamic modification of resource specifications without the need for redeployment or recreation of the function.

[0266] like Figure 11 As shown, in interface I0, clicking the "Create Function" control I01 will display the following... Figure 7 Interface I1, as shown, enables the creation of new functions. Interface I0 displays already created functions. Clicking the "Edit" control I02 corresponding to a created function displays an editing interface for modifying that function. For example, if interface I0 displays a function named "**rithm_proxy" as an already created function, clicking the corresponding "Edit" control I02 will display interface I3, where the user can modify the resource specifications of the created function.

[0267] In interface I3, users can modify the resource specifications of either the slave function or the main function.

[0268] For example, such as Figure 11 As shown within the dashed rectangle I31, compare... Figure 7 The user changed the CPU memory specification in a single sub-instance from 0.3 cores / 128M to 1.4 cores / 1536M.

[0269] In interface I3, users can click the "Add Child Instance" control I32 to add a child instance, such as... Figure 11 The dashed box 133 shows that the user added two child instances by clicking the "Add Child Instance" button.

[0270] exist Figure 7 During the creation process shown, users can also set the number of child instances by clicking the "Add Child Instance" button.

[0271] It should be noted that modifying the resource specifications of a slave function can involve modifying the resource specifications of an existing single child instance (slave function container instance), or adding or removing the number of child instances, and configuring the resource specifications of each child instance. In some embodiments, only multiple slave functions using the same resource specifications are supported. In other embodiments, configuring slave functions with different resource specifications is also supported.

[0272] also, Figure 7 or Figure 11 In the interface example shown, the number of NPU cards is 8. In actual applications, in other embodiments, the number of NPU cards or GPU cards is not limited to 8 and can be other values.

[0273] Therefore, users only need to edit the existing functions to modify resource specifications without creating new functions, simplifying the process. The cloud service platform can automatically restart the corresponding resource specification's slave function container instance and reconnect it to the main function based on the user's modifications through the front-end interface. Users can be completely unaware of the restart and reconnection processes on the cloud service platform, significantly reducing the complexity of user operations and improving the user experience.

[0274] Specifically, Figure 5 Following the process shown, as Figure 12 As shown, the process for modifying resource specifications can be achieved through the following steps:

[0275] S31: In response to user U1's modification of the resource specifications of the slave function in the function management interface, upload the modified resource specification information of the slave function to the server-side function management module.

[0276] For example, suppose the total computing power needs to be increased to 80 GPU cards. User U1 updates the function configuration in the function management interface. Previously, the main function was configured with 8 GPU cards, and the slave function was configured with 2 GPU cards, for a total computing power of 10 GPU cards. Therefore, 70 more GPU cards are needed. Consequently, the total computing power of the slave function needs to be changed from 2 cards to 72 cards. At this time, the relevant configuration of the slave function needs to be updated. The modification items are shown in Table 2 below:

[0277]

[0278]

[0279] S32: Server-side function management module, which restarts the function container instance based on the modified resource specification information.

[0280] After user U2 updates the function in the function management interface, the resource specification parameters shown in Table 2 above will be passed to the server-side function management module of the cloud server. The server-side function management module receives the configuration parameters from the function management interface and recognizes that the "number of slave functions" has changed from "2" to "9" and the "specification of slave functions" has changed from "1*NVIDIA TeslaV100" to "8*NVIDIA Tesla V100". Therefore, it restarts with 9 slave function container instances.

[0281] For example, specifically, after the server-side function management module receives a request to update a slave function, it analyzes the changes in the slave function's resource specifications contained in the configuration information, and then restarts the slave function container instance. The logic or rules used are as follows:

[0282] If the resource specifications corresponding to a single slave function have not changed, existing slave function container instances can be reused. Slave function container instances that need to be deleted are notified to gracefully exit and scale down. For slave function container instances that need to be added, resources are reallocated and the corresponding container instances are started.

[0283] If the resource specifications corresponding to a single slave function change, all launched slave function container instances need to be scaled down and resources reallocated according to the modified specifications, and the corresponding slave function container instances need to be launched.

[0284] For example, in this embodiment, if the resource specification of a single slave function is changed from "1*NVIDIA Tesla V100" to "8*NVIDIA Tesla V100", then all the started slave function container instances need to be notified to gracefully exit and shrink, and then 9 slave function container instances with the specification of "8*NVIDIA Tesla V100" are started.

[0285] S33: Establish a connection between the main function container instance and the re-launched slave function container instance.

[0286] The server-side function management module injects the networking information of the restarted slave functions into the main function container instance. The main function container instance then initiates connections with each restarted slave function based on its networking information. The specific connection establishment process can be found in step S20 above.

[0287] S34: After the connection is established, the main function container sends a notification message to the terminal device 3031, indicating that it can provide services to the outside world.

[0288] Once the main function container instance and the slave function container instance are connected, the main-slave function interaction module notifies the developer (user U1) that the cloud service program is ready, and the cloud service program developed by the developer begins to provide services to the outside world.

[0289] S35: In response to the usage command of user U2, terminal device 3032 initiates a call request to the cloud service program in the main function container instance.

[0290] S36: The main function container decomposes the task into slave functions for parallel computation.

[0291] S37: A main function container instance combines the computation results of the main function and at least one subordinate function.

[0292] S38: The main function container instance returns the inference result to terminal device 3032.

[0293] The computing resource scheduling method provided in this application embodiment can dynamically adjust the total computing power of the entire service by dynamically modifying the specifications and quantity of function instances, and user U1 does not need to recreate the function, realizing dynamic modification of resource specifications, improving the convenience of developer user U1 to modify resource specifications, and enhancing user experience.

[0294] Specifically, when user U1 modifies the configuration information of the slave function on the management interface, the terminal device 3031 will synchronize the corresponding configuration information change to the cloud server. The server-side function management module will re-complete the change of the slave function instance and the connection between the master and slave functions according to the configuration change. During the process, the cloud service program of developer user U1 is unaware of the actual change process, which reduces the impact of modifying resources on the cloud service program and avoids the problem of AI service being temporarily offline due to modification of resource specifications. In this way, it can also improve the user experience of AI service user U2.

[0295] Based on the above exemplary description, the embodiments of this application provide a method for scheduling computing resources, described from the perspective of the terminal device, such as... Figure 13 As shown, the method may include:

[0296] S40: In response to the first user instruction of the creation function, the terminal device displays the first resource specification information.

[0297] For example, such as Figure 11 As shown, in interface I0, when user U1 clicks the "Create Function" control I01, the terminal device detects the user U1's click action on control I01, which is considered as receiving the first user instruction to create a function, and displays as shown. Figure 7 The creation function interface I1 shown displays the first resource specification information.

[0298] The first resource specification information describes the first resource specification corresponding to the user-created main function. Specifically, the first resource specification may include information such as the type of card enabled, the number of cards, and the computing power of a single card. The card type can be an NPU or GPU, or even a TPU, etc. For GPUs, the computing power of a single card can be the GPU model and its video memory; for example, the configuration of a single GPU could be: Graphics card model: NVIDIA Tesla V100; Video memory: 32GB. For NPUs, the computing power of a single NPU accelerator card could be 376T or 280T, etc.

[0299] For example, in the above exemplary description, Figure 7 The controls within the dashed box I11 in the interface I1 are used to edit the resource specifications corresponding to the main function. All or part of the content displayed in dashed box I11 represents the first resource specification information. For example, the controls displayed in dashed box I111 are used to set NPU specification information, while the controls displayed in dashed box I112 are used to set GPU specification information. For instance, in dashed box I111, if the "Quantity: 8" control is selected, it indicates that the number of NPU cards is 8; "Computing Power: 376T" indicates that the computing power of a single NPU card is 376T. The user can click the drop-down button to the right of 376T to select other computing power options. If the "Do Not Use NPU" control is not selected, but the "Do Not Use GPU" control is selected, it indicates that the card type is an NPU card; if the "Do Not Use NPU" control is selected, but the "Do Not Use GPU" control is not selected, it indicates that the card type is a GPU card.

[0300] S41: In response to a second user instruction to enable a slave function, the terminal device displays second resource specification information.

[0301] For example, such as Figure 7 As shown, in interface I1, user U1 performs a sliding action on the sliding button in control I12 that enables the sub-instance. The terminal device detects user U1's sliding action on control I12 and considers it as receiving a second user instruction from user U1 to enable the sub-function. In response to the second user instruction, the second resource specification information is displayed.

[0302] The second resource specification information describes the second resource specifications corresponding to at least one slave function attached to the main function. The second resource specification may include the number of slave functions and the resource specifications of an individual slave function. Specifically, the resource specifications of an individual slave function may include: card type, number of cards, computing power of a single card, and resource information of basic hardware such as CPU and memory. For example, Figure 7The controls within the dashed box I21 in interface I2 are used to set the second resource specification information. Some or all of the content displayed within dashed box I21 represents this second resource specification information. Since the card type is set to NPU card in interface I1, after enabling the slave function, only the NPU resource specifications are displayed within dashed box I21. This means the default card type is NPU, the number of cards is 8, the computing power of a single NPU card is 376T, the corresponding CPU specifications are 0.3 cores, and the memory specifications are 128M. Figure 7 The interface example shown is only an example with 8 NPU cards by default. In other embodiments, the number of cards can be set by the user to a value other than 8, and the upper limit of the value is determined according to the actual hardware.

[0303] The total computing power obtained based on the first and second resource specifications is greater than the computing power limit corresponding to a single physical machine. This can be achieved by uniformly quantifying the first and second resource specifications, as well as the resource specifications corresponding to a single physical machine, using the same unit or indicator. In this case, the sum of the values ​​corresponding to the first and second resource specifications is greater than the value of the resource specification corresponding to a single physical machine. For example, the resource limit corresponding to a single physical machine is: 8 Tesla V100 graphics cards, each with 32GB of video memory, for a total video memory limit of 256GB; the first resource specification is: 8 Tesla V100 graphics cards, each with 32GB of video memory, for a total video memory of 256GB; the second resource specification is: 2 Tesla V100 graphics cards, each with 32GB of video memory, for a total video memory of 64GB; the total computing power obtained from the first and second resource specifications is equivalent to 10 Tesla V100 graphics cards, with a total video memory of 32GB * 10 = 320GB. Therefore, the total computing power of the first and second resource specifications can be represented as 10 graphics cards or 320GB of video memory, which is greater than the resource limit of a single physical machine of 8 graphics cards or 256GB.

[0304] S42: The terminal device uploads the first resource specification information and the second resource specification information to the server.

[0305] The server is used to obtain a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function; the first container image deploys a cloud service program; the server is used to launch a first container instance conforming to first resource specification information based on the first container image, and launch at least one second container instance conforming to second resource specification information based on at least one second container image; establish a connection between the first container instance and at least one second container instance; and in response to a call request to the cloud service program, decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection.

[0306] In some embodiments, after S42, the following steps may also be included:

[0307] S43: In response to a third user instruction to edit an already created function, the terminal device displays third resource specification information.

[0308] The third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one slave function after the modification.

[0309] For example, such as Figure 11 As shown, interface I0 displays the function names of two functions created by user U1. User U1 clicks the "Edit" control I02 corresponding to the created function named "**rithm_proty". The terminal device detects the user U1's click action on control I01, which is considered as receiving a third-party instruction. In response to the third-party instruction, it displays as follows. Figure 11 The editing function interface I3 is shown.

[0310] In interface I3, the controls within the dashed box I31 are used to modify or edit the resource specifications corresponding to a single slave function. For example, the currently modified resource specifications for a single slave function (sub-instance) are: 8 NPU cards, NPU card acceleration computing power: 376T, and CPU / memory specifications: 1.4 cores / 1536M. A user can add a slave function by clicking the "Add Sub-Instance" control I32 once.

[0311] exist Figure 11 In the interface example shown, the resource specifications of each slave function are equal to each other. In other embodiments, slave functions with different resource specifications can be supported.

[0312] Generally, modifying the resource specifications of a function is to enhance computing power; that is, the computing power corresponding to the modified third resource specification is greater than that corresponding to the second resource specification. In some cases, it is also possible to reduce computing power, for example, the computing power of the modified third resource specification is less than that of the second resource specification.

[0313] S44: The terminal device uploads the third-party resource specification information to the server.

[0314] The server is configured to launch at least one third container instance that conforms to third resource specification information based on at least one second container image; establish a connection between the first container instance and at least one third container instance; and, in response to a call request for a cloud service program, decompose the computing tasks corresponding to the cloud service program through the connection and distribute them to the first container instance and at least one third container instance.

[0315] The steps for establishing a connection between the first container instance and at least one third container instance can be referred to the above. Figure 10 The relevant explanations will not be repeated here.

[0316] Based on the above exemplary description, the embodiments of this application provide a computing resource scheduling method, described from the perspective of the server side, such as... Figure 14 As shown, the method may include:

[0317] S50: Receive the first resource specification information and the second resource specification information uploaded by the first terminal device.

[0318] The server is used to obtain a first container image corresponding to the main function and at least one second container image corresponding to at least one slave function; the first container image deploys a cloud service program. Based on the above example, the cloud service program integrates at least an AI model.

[0319] S51: Based on the first container image, start a first container instance that conforms to the first resource specification information, and based on at least one second container image, start at least one second container instance that conforms to the second resource specification information.

[0320] Launch at least one second container instance that conforms to the second resource specification information. This can be the total resource specification corresponding to at least one second container instance that conforms to the second resource specification information.

[0321] For example, based on the resource specifications shown in Table 1, it is necessary to start (or launch) a first container instance with the following resource specifications: GPU card type, 8*NVIDIA Tesla V100 graphics cards, and 8*32GB video memory, and start (or launch) two second container instances with the following resource specifications: GPU card type, 1*NVIDIA Tesla V100 graphics card, and 1*32GB video memory.

[0322] S52: Establish a connection between the first container instance and at least one second container instance.

[0323] Please refer to Figure 10 Related explanations will not be repeated here.

[0324] S53: In response to a call request for a cloud service program, decompose the computing task corresponding to the cloud service program and distribute it to a first container instance and at least one second container instance through a connection.

[0325] Task decomposition can be based on the proportion of resource specifications corresponding to the slave functions of the main function. For example, as shown in Table 1, the resource specifications can be used to decompose the task into 8 parts, with the main function undertaking 6 parts of the computation task and each slave function undertaking 1 part of the computation task.

[0326] Combination Figure 6a As illustrated in the related exemplary descriptions, in some embodiments, in response to a first acquisition request from a second terminal device, the server may distribute a first operating system image to the second terminal device, and then the server may receive a first container image uploaded by the second terminal device.

[0327] For example, the first operating system image could be Figure 6a The base image shown.

[0328] Both the first terminal device and the second terminal device are terminal devices on the user U1 side. The first terminal device and the second terminal device can be the same device or different devices. The first terminal device can be connected to... Figure 13 The terminal devices in the method shown are the same electronic device.

[0329] Based on the above exemplary description, the master-slave function interaction module is a module with the following functions: establishing a connection between a first container instance and at least one second container instance; and responding to a call request for a cloud service program, decomposing the computing task corresponding to the cloud service program and distributing it to the first container instance and at least one second container instance through the connection. Alternatively, it can also receive computing results sent by at least one second container instance, merge the computing results of the first container instance and at least one second container instance, and send the inference result corresponding to the computing result to the terminal device on the user U2 side.

[0330] The first container image can be Figure 6a The custom container image shown. Combined with... Figure 6a It can be seen that the first container image is obtained by installing a cloud service program on the base image (the first operating system image). The cloud service program integrates a trained artificial intelligence (AI) model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

[0331] Combination Figure 6b As illustrated in the relevant examples, in response to a second acquisition request from a second terminal device, the server may send a software development kit (SDK) with a master-slave function interaction module to the second terminal device.

[0332] The first container image can be Figure 6b The custom container image shown is obtained by installing a cloud service program into an operating system image. The cloud service program integrates an SDK and a trained AI model.

[0333] In some embodiments, a function management module is deployed in the server (e.g., it may be...). Figure 4The server-side function management module shown is used to transmit at least one set of network information corresponding to at least one second container instance to the first container instance. One set of network information includes one or more of the following: an Internet Protocol (IP) address and a port number corresponding to a second container instance.

[0334] according to Figure 8 and Figure 10 Based on the relevant exemplary descriptions, it can be seen that establishing a connection between a first container instance and at least one second container instance can be achieved by the first container instance initiating a connection request to the corresponding at least one second container instance based on at least one set of network topology information; each of the at least one second container instance, in response to the connection request, obtains the configuration information corresponding to its own container instance; the configuration information includes resource specification information corresponding to a second container instance; if the first container instance determines that the configuration information conforms to the second resource specification information, it issues a certificate to be verified to the second container instance; the second container instance verifies the certificate, and if the verification is successful, a connection is established between the first container instance and at least one second container instance.

[0335] Combination Figure 9 As shown, the first container image deploys a master-slave function module, which can be deployed in an operating system or a cloud service application. The second container image deploys a slave function application.

[0336] Combination Figure 10 The configuration information may also include the version information of the slave function program corresponding to the second container instance. If the first container instance determines that the configuration information conforms to the second resource specification information and that the version information of the slave function is compatible with the version information of the master and slave function modules, it will issue the certificate to be verified to the second container instance.

[0337] Based on the above exemplary description, it can be seen that the method proposed in this application embodiment, when the computing power resources required by the cloud service program need to be enhanced, also supports dynamically modifying the resource specifications of the function without recreating the function. Specifically, the server receives the third resource specification information uploaded by the first terminal device. Then, based on at least one second container image, at least one third container instance conforming to the third resource specification information is started; a connection is established between the first container instance and at least one third container instance; in response to the call request of the cloud service program, the computing tasks corresponding to the cloud service program are decomposed through the connection and distributed to the first container instance and at least one third container instance.

[0338] Based on the above exemplary description, the second resource specification information (such as...) can be obtained. Figure 7The information displayed within the dashed box I21 shown) and third resource specification information (such as...) Figure 11 Each of the information displayed in the dashed box I33 includes the number of slave functions and the resource specification corresponding to a single slave function.

[0339] The third resource specification information, compared to the second resource specification information, may be that the resource specifications of a single slave function remain unchanged, only the number of slave functions changes; or, the resource specifications of a single slave function change.

[0340] If it is determined that the resource specifications of a single slave function have changed, all instances in at least one second container instance are scaled down and deleted; based on the third resource specification information, at least one third container instance with the corresponding number of slave functions and the corresponding resource specifications of a single slave function is started; or, if it is determined that the resource specifications corresponding to a single slave function have not changed but the number of slave functions has changed, a second container instance is added or deleted to obtain at least one third container instance that meets the number of slave functions in the third resource specification information.

[0341] This application also provides a terminal device, including:

[0342] The front-end interaction module is used to display first resource specification information in response to a first user instruction to create a function; the first resource specification information describes the first resource specification corresponding to the main function created by the user; and, in response to a second user instruction to enable a slave function, displays second resource specification information; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine;

[0343] The upload module is used to upload the first resource specification information and the second resource specification information to the server.

[0344] The server is used to obtain a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function; the first container image deploys a cloud service program; the server is used to launch a first container instance conforming to first resource specification information based on the first container image, and launch at least one second container instance conforming to second resource specification information based on at least one second container image; establish a connection between the first container instance and at least one second container instance; and in response to a call request to the cloud service program, decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection.

[0345] This application embodiment also provides a server for obtaining a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function; the first container image deploys a cloud service program.

[0346] The following modules can be deployed on the server:

[0347] The receiving module is used to receive first resource specification information and second resource specification information uploaded by the first terminal device; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user; the second resource specification information is used to describe the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine.

[0348] The function management module is used to start a first container instance that conforms to the first resource specification information based on the first container image, and to start at least one second container instance that conforms to the second resource specification information based on at least one second container image.

[0349] The master-slave function interaction module is used to establish a connection between the first container instance and at least one second container instance, and in response to a call request to the cloud service program, decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and at least one second container instance through the connection.

[0350] The master-slave function interaction module and function management module can be implemented in software or hardware. For example, the implementation of the master-slave function interaction module will be described below. Similarly, the implementation methods of other modules can refer to the implementation method of the master-slave function interaction module.

[0351] As an example of a software functional unit, a master-slave function interaction module may include code running on a compute instance. A compute instance may include at least one of a physical host (computing device), a virtual machine, or a container. Furthermore, the aforementioned compute instance may be one or more. For example, a master-slave function interaction module may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code may be distributed within the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code may be distributed within the same availability zone (AZ) or in different AZs, each AZ comprising one or more geographically proximate data centers. Typically, a region may include multiple AZs.

[0352] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0353] As an example of a hardware functional unit, a master-slave function interaction module may include at least one computing device, such as a server. Alternatively, a master-slave function interaction module may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The aforementioned PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.

[0354] The master-slave function interaction module includes multiple computing devices that can be distributed within the same region or in different regions. Similarly, the master-slave function interaction module includes multiple computing devices that can be distributed within the same Availability Zone (AZ) or in different AZs. Likewise, the master-slave function interaction module includes multiple computing devices that can be distributed within the same Virtual Private Cloud (VPC) or multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0355] It should be noted that, in other embodiments, the master-slave function interaction module can be used to execute any step in the computing resource scheduling method, and the function management module can be used to execute, for example... Figure 5 Any step in the computing resource scheduling method shown.

[0356] The steps implemented by the master-slave function interaction module and other modules can be specified as needed. These steps can be achieved through multiple modules, such as the master-slave function interaction module and the upload module. Figure 5 The different steps in the computing resource scheduling method shown are used to realize all the functions of the computing resource scheduling device.

[0357] As an example of a software functional unit, a computing resource scheduling device may include code running on computing instances. A computing instance can be at least one of a physical host (computing device), a virtual machine, a container, or other computing devices. Furthermore, the aforementioned computing devices may be one or more. For example, the computing resource scheduling device may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the application can be distributed in the same region or in different regions. The multiple hosts / virtual machines / containers used to run the code can be distributed in the same Availability Zone (AZ) or in different AZs, each AZ including one data center or multiple geographically proximate data centers. Typically, a region may include multiple AZs.

[0358] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same VPC or across multiple VPCs. Typically, a VPC is set up within a single region. Communication between two VPCs within the same region, and between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0359] As an example of a hardware functional unit, a computing resource scheduling device may include at least one computing device, such as a server. Alternatively, the computing resource scheduling device may also be a device implemented using an ASIC or a PLD. The aforementioned PLD may be implemented using a CPLD, FPGA, GAL, or any combination thereof.

[0360] The computing resource scheduling device includes multiple computing devices that can be distributed within the same region or in different regions. Similarly, the computing devices can be distributed within the same Availability Zone (AZ) or in different AZs. Likewise, the computing devices can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0361] This application also provides a computing device 500. For example... Figure 15 As shown, the computing device 500 includes a bus 502, a processor 505, a memory 506, and a communication interface 508. The processor 505, the memory 506, and the communication interface 508 communicate with each other via the bus 502. The computing device 500 can be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 500.

[0362] Bus 502 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 15 The bus 502 may be represented by a single line, but this does not mean that there is only one bus or one type of bus. The bus 502 may include a path for transmitting information between various components of the computing device 500 (e.g., memory 506, processor 505, communication interface 508).

[0363] Processor 504 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).

[0364] Memory 506 may include volatile memory, such as random access memory (RAM). Processor 504 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid state drive (SSD).

[0365] The memory 506 stores executable program code, and the processor 504 executes this executable program code to implement the functions of the aforementioned master-slave function interaction module and function management module, thereby achieving the following: Figure 5 The computing resource scheduling method shown is as follows. That is, memory 506 stores the data used to execute methods such as... Figure 5 The instructions for the computing resource scheduling method are shown.

[0366] The communication interface 508 uses transceiver modules, such as, but not limited to, network interface cards and transceivers, to enable communication between the computing device 500 and other devices or communication networks.

[0367] This application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

[0368] like Figure 16 As shown, the computing device cluster includes at least one computing device 500. The memory 506 in one or more computing devices 500 within the computing device cluster may store the same memory for performing tasks such as... Figure 5 The instructions for the computing resource scheduling method are shown.

[0369] In some possible implementations, the memory 506 of one or more computing devices 500 in the computing device cluster may also store memory for performing tasks such as... Figure 5 The instructions for the computing resource scheduling method shown are partial instructions. In other words, a combination of one or more computing devices 500 can jointly execute instructions for performing tasks such as... Figure 5 The instructions for the computing resource scheduling method are shown.

[0370] It should be noted that the memory 506 in different computing devices 500 within the computing device cluster can store different instructions, each used to execute a portion of the functions of the computing resource scheduling device. That is, the instructions stored in the memory 506 of different computing devices 500 can implement the functions of one or more modules, such as the illusion correction module and the function management module.

[0371] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN), etc. Figure 17 One possible implementation is shown. For example... Figure 17 As shown, two computing devices 500A and 500B are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device. In this type of possible implementation, the memory 506 in computing device 500A stores instructions for executing the functions of a master-slave function interaction module. For example, the functions implemented by the master-slave function interaction module can be as described in the above exemplary description. Meanwhile, the memory 506 in computing device 500B stores instructions for executing slave function programs or other module functions.

[0372] Figure 17 The connection method between the computing device clusters shown can be that, considering that the computing resource scheduling method provided in this application requires a large amount of data storage and computation, the functions implemented by the master-slave function interaction module are handed over to the computing device 500B for execution.

[0373] It should be understood that Figure 17 The functions of computing device 500A shown can also be performed by multiple computing devices 500. Similarly, the functions of computing device 500B can also be performed by multiple computing devices 500.

[0374] This application also provides another computing device cluster. The connection relationships between the computing devices in this computing device cluster can be similarly referred to... Figure 17 and Figure 17 The connection method of the computing device cluster. The difference is that the memory 506 in one or more computing devices 500 within this computing device cluster can store the same data for execution. Figure 5 or Figure 10 The instructions for the computing resource scheduling method in the corresponding embodiment.

[0375] In some possible implementations, the memory 506 of one or more computing devices 500 in the computing device cluster may also store memory for execution. Figures 5 to 12 The diagram shows a portion of the instructions for the computing resource scheduling method in the corresponding embodiment. In other words, a combination of one or more computing devices 500 can jointly execute instructions for performing computing resource scheduling tasks. Figures 5 to 12 Instructions for the computing resource scheduling method in any embodiment corresponding to any of the figures.

[0376] It should be noted that the memory 506 in different computing devices 500 within the computing device cluster can store different instructions for executing some functions of the computing resource scheduling device. In other words, the instructions stored in the memory 506 of different computing devices 500 can implement the functions of the computing resource scheduling device.

[0377] This application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions, capable of running on a computing device or stored on any usable medium. When the computer program product runs on at least one computing device, it causes the at least one computing device to execute a computing resource scheduling method.

[0378] This application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store, or a data storage device such as a data center that includes one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that instruct the computing device to execute a computing resource scheduling method, or instruct the computing device to execute a computing resource scheduling method.

[0379] This application also provides an electronic device, the electronic device comprising: a processor, the processor being configured to execute a computer program or instructions in a memory to implement the method as described in any of the above embodiments.

[0380] It should be noted that a processor can be a chip with computing capabilities, and is not limited to a central processing unit (CPU). For example, a processor can be a chip that includes one or more transistors, resistors, capacitors, and other circuit elements to perform a certain function; or it can be an integrated circuit in various packages that can implement the above methods.

[0381] This application also provides a chip system, including: a communication interface for inputting and / or outputting data; and a processor for executing a computer-executable program, causing a device equipped with the chip system to perform the methods described in any of the above embodiments.

[0382] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for scheduling computing resources, characterized in that, The method includes: The terminal device responds to the first user instruction to create a function and displays the first resource specification information; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user. In response to a second user instruction to enable a slave function, the terminal device displays second resource specification information; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; The terminal device uploads the first resource specification information and the second resource specification information to the server; The server obtains the first container image corresponding to the main function and at least one second container image corresponding to the at least one slave function; the first container image contains a cloud service program. The server starts a first container instance conforming to the first resource specification information based on the first container image, starts at least one second container instance conforming to the second resource specification information based on the at least one second container image, and establishes a connection between the first container instance and the at least one second container instance. In response to a request to invoke the cloud service program, the server decomposes the computing task corresponding to the cloud service program and distributes it to the first container instance and the at least one second container instance through the connection.

2. The method as described in claim 1, characterized in that, The method further includes: The terminal device responds to a third user instruction to edit a created function and displays third resource specification information; the third resource specification information is obtained by the user modifying the second resource specification information and is used to describe the third resource specification corresponding to at least one slave function after modification; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; The terminal device uploads the third resource specification information to the server; The server, based on the at least one second container image, starts at least one third container instance that conforms to the third resource specification information; and establishes a connection between the first container instance and the at least one third container instance. In response to a request to invoke the cloud service program, the server decomposes the computing task corresponding to the cloud service program through the connection and distributes it to the first container instance and the at least one third container instance.

3. A method for scheduling computing resources, characterized in that, The method is applied to a server, which is used to obtain a first container image corresponding to a main function and at least one second container image corresponding to at least one slave function. Deploy cloud service programs in the first container image; The method includes: The server receives first resource specification information and second resource specification information uploaded by the first terminal device; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user; the second resource specification information is used to describe the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; The server starts a first container instance that conforms to the first resource specification information based on the first container image, and starts at least one second container instance that conforms to the second resource specification information based on the at least one second container image. The server establishes a connection between the first container instance and at least one second container instance; In response to a request to invoke the cloud service program, the server decomposes the computing task corresponding to the cloud service program and distributes it to the first container instance and the at least one second container instance through the connection.

4. The method as described in claim 3, characterized in that, Before receiving the first resource specification information and the second resource specification information uploaded by the first terminal device, the method further includes: In response to the first acquisition request from the second terminal device, the server sends the first operating system image to the second terminal device. Wherein, the first operating system image is an operating system image integrating a master-slave function interaction module; the master-slave function interaction module is used to establish a connection between the first container instance and at least one second container instance, and in response to a call request for the cloud service program, decompose the computing tasks corresponding to the cloud service program, and distribute them to the first container instance and the at least one second container instance through the connection; The server receives the first container image uploaded by the second terminal device; The first container image is obtained by installing a cloud service program on the first operating system image; the cloud service program integrates a trained artificial intelligence (AI) model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

5. The method as described in claim 3, characterized in that, Before the server receives the first resource specification information and the second resource specification information uploaded by the first terminal device, the method further includes: In response to a second acquisition request from a second terminal device, the server sends a software development kit (SDK) integrating a master-slave function interaction module to the second terminal device. The master-slave function interaction module is used to establish a connection between the first container instance and at least one second container instance, and in response to a call request for the cloud service program, to decompose the computing tasks corresponding to the cloud service program and distribute them to the first container instance and the at least one second container instance through the connection. The server receives a first container image uploaded by the second terminal device; the first container image is obtained by installing a cloud service program in an operating system image; the cloud service program integrates the SDK and a trained AI model; the cloud service program is used to provide AI services to users by calling the AI ​​model.

6. The method according to any one of claims 3-5, characterized in that, The server is equipped with a function management module. Before the server establishes a connection between the first container instance and at least one second container instance, the method further includes: The function management module transmits at least one set of network information corresponding to at least one second container instance to the first container instance; one set of information in the at least one set of network information includes one or more of the Internet Protocol IP address information and port number corresponding to a second container instance. The server establishes a connection between the first container instance and at least one second container instance, including: The server triggers the first container instance to initiate a connection request to at least one corresponding second container instance based on the at least one set of network information. Each of the at least one second container instance, in response to the connection request, obtains the configuration information corresponding to its own container instance; the configuration information includes resource specification information corresponding to a second container instance. If the first container instance determines that the configuration information conforms to the second resource specification information, it issues a certificate to be verified to the second container instance. The second container instance verifies the certificate. If the verification is successful, a connection is established between the first container instance and at least one second container instance.

7. The method as described in claim 6, characterized in that, The first container image deploys a master-slave function module, and the second container image deploys a slave function program; The configuration information also includes version information of the slave function program corresponding to a second container instance; If the first container instance determines that the configuration information conforms to the second resource specification information, it issues a certificate to be verified to the second container instance, including: If the first container instance determines that the configuration information conforms to the second resource specification information and that the version information of the slave function is compatible with the version information of the master and slave function modules, it issues the certificate to be verified to the second container instance.

8. The method according to any one of claims 3-7, characterized in that, The method further includes: The server receives third resource specification information uploaded by the first terminal device; the third resource specification information is obtained by the user modifying the second resource specification information, and is used to describe the third resource specification corresponding to at least one modified slave function; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification. The server, based on the at least one second container image, starts at least one third container instance that conforms to the third resource specification information; and establishes a connection between the first container instance and the at least one third container instance. In response to a request to invoke the cloud service program, the server decomposes the computing task corresponding to the cloud service program through the connection and distributes it to the first container instance and the at least one third container instance.

9. The method as described in claim 8, characterized in that, Each of the second resource specification information and the third resource specification information includes the number of slave functions and the resource specification corresponding to a single slave function; Launching at least one third container instance that conforms to the third resource specification information includes: If the resource specifications of a single slave function change, all instances in the at least one second container instance are scaled down and deleted. Based on the third resource specification information, start at least one third container instance with the corresponding number of slave functions and the corresponding resource specification of a single slave function; or, If the resource specification corresponding to a single slave function has not changed and the number of slave functions has changed, add or delete the second container instance to obtain at least one third container instance that meets the number of slave functions in the third resource specification information.

10. A method for scheduling computing resources, characterized in that, The method is applied to a terminal device, and the method includes: The terminal device responds to the first user instruction to create a function and displays the first resource specification information; the first resource specification information is used to describe the first resource specification corresponding to the main function created by the user. In response to a second user instruction to enable a slave function, the terminal device displays second resource specification information; the second resource specification information describes the second resource specification corresponding to at least one slave function attached to the main function; the total computing power obtained based on the first resource specification and the second resource specification is greater than the computing power limit corresponding to a single physical machine; The terminal device uploads the first resource specification information and the second resource specification information to the server.

11. The method as described in claim 10, characterized in that, The method further includes: The terminal device responds to a third user instruction to edit a created function and displays third resource specification information; the third resource specification information is obtained by the user modifying the second resource specification information and is used to describe the third resource specification corresponding to at least one slave function after modification; the computing power corresponding to the third resource specification is greater than the computing power corresponding to the second resource specification; The terminal device uploads the third resource specification information to the server.

12. An electronic device, characterized in that, The electronic device includes a processor for executing a computer program or instructions in a memory to implement the method as described in any one of claims 1-11.

13. A computing device cluster, characterized in that, It includes at least one computing device, each computing device including a processor and memory; The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method as described in any one of claims 1-11.

14. A computer program product containing instructions, characterized in that, When the instruction is executed by the computing device cluster, the computing device cluster performs the method as described in any one of claims 1-11.

15. A computer-readable storage medium, characterized in that, It includes computer program instructions, which, when executed by a cluster of computing devices, perform the method as described in any one of claims 1-11.