Large model calling method, computer device, readable storage medium and program product
By using a unified call request template and large model call request parameters, the request type and resource information are determined, and the target large model is matched for the call. This solves the problem of low efficiency in large model calls and enables rapid access and efficient calls.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SF TECH CO LTD
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-19
AI Technical Summary
The existing methods for calling large models are inefficient, resulting in low efficiency for enterprises when accessing and calling large models.
By setting a unified call request template, the request type and resource request information are determined using the large model call request parameters, and the target large model is matched for the call, thus achieving rapid integration.
It improves the efficiency of large model invocation and reduces the time and resource consumption during the access and invocation process.
Smart Images

Figure CN122240210A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, computer device, computer-readable storage medium, and computer program product for calling large models. Background Technology
[0002] With the development of artificial intelligence technology, various large-scale AI models exist. These models are based on deep learning and can learn and process massive amounts of data. For enterprises, to ensure normal operation, it is necessary to use these large models for related business processing, meaning they need to be integrated into the enterprise's internal systems for invocation. Currently, the common method for invoking large models is to configure each model independently. However, given the wide variety of large models available, this method of independently configuring and integrating each model leads to reduced efficiency in large-scale model invocation.
[0003] Therefore, the current large model calling method suffers from low calling efficiency. Summary of the Invention
[0004] Therefore, it is necessary to provide a method, apparatus, computer device, computer-readable storage medium, and computer program product for calling large models that can improve calling efficiency in response to the above-mentioned technical problems.
[0005] Firstly, this application provides a method for calling a large model, the method comprising:
[0006] Obtain the large model call request parameters, input the large model call request parameters into a preset call template, and obtain the corresponding call request;
[0007] Based on the call request, determine the request type and the corresponding resource request information;
[0008] Obtain the resource information corresponding to the request type from each of the major models to be invoked, and determine the target major model based on the resource request information and each of the resource information;
[0009] The target large model is invoked according to the invocation request, and the invocation result output by the target large model based on the invocation request is obtained.
[0010] In one embodiment, determining the request type and corresponding resource request information based on the invocation request includes:
[0011] Obtain the parameter type of the large model call request parameter in the call request;
[0012] The request type of the call request is determined based on the parameter type, and the computing resource request information and numerical resource request information corresponding to the request type are obtained as the resource request information.
[0013] In one embodiment, determining the request type of the call request based on the parameter type, and obtaining the computing resource request information and numerical resource request information corresponding to the request type as the resource request information, includes:
[0014] If the parameter type is text, then the request type is determined to be the first type, and the first computing resource request information and the first numerical resource request information corresponding to the first type are obtained;
[0015] If the parameter type is an image, then the request type is determined to be the second type, and the second computing resource request information and the second numerical resource request information corresponding to the second type are obtained.
[0016] In one embodiment, obtaining the resource information corresponding to the request type in each of the various large models to be invoked includes:
[0017] For each of the large models, if the request type is the first type, obtain the first computing resource information and the first numerical resource information from the preset configuration model;
[0018] If the request type is the second type, obtain the second computing resource information and the second numerical resource information from the preset configuration model.
[0019] In one embodiment, determining the target large model based on the resource request information and each of the resource information includes:
[0020] For each of the large models, obtain the first comparison result between the current number of requests corresponding to the large model and the preset request frequency threshold;
[0021] The resource call frequency is determined based on the currently occupied resource information and the resource request information in the resource information, and a second comparison result between the resource call frequency and the preset resource call frequency threshold is obtained;
[0022] Based on the first comparison result, the second comparison result, and the preset weights corresponding to the large model, the model score corresponding to the large model is determined;
[0023] The target large model is determined based on the scores of each model corresponding to each large model.
[0024] In one embodiment, the step of invoking the target large model according to the invocation request and obtaining the invocation result output by the target large model based on the invocation request includes:
[0025] Obtain the requesting user information corresponding to the call request, and obtain the model identifier corresponding to the target large model;
[0026] Generate a request structure based on the invocation request, the requesting user information, and the model identifier;
[0027] According to the calling interface corresponding to the target large model, the request structure is input into the target large model, and the target large model outputs the corresponding return parameters based on the request structure;
[0028] Based on the returned parameters, the resource request information, and the model identifier of the target large model, the call result corresponding to the call request is generated.
[0029] In one embodiment, after invoking the target large model according to the invocation request and obtaining the invocation result output by the target large model based on the invocation request, the method further includes:
[0030] The call results are recorded by the Kafka middleware in the gateway layer to obtain the recorded call results.
[0031] The recorded call results are pushed to the management terminal.
[0032] Secondly, this application provides a large model calling device, the device comprising:
[0033] The acquisition module is used to acquire the large model call request parameters, input the large model call request parameters into a preset call template, and obtain the corresponding call request;
[0034] The first determining module is used to determine the request type and the corresponding resource request information based on the call request;
[0035] The second determining module is used to obtain the resource information corresponding to the request type in each of the various large models to be called, and determine the target large model based on the resource request information and each of the resource information.
[0036] The calling module is used to call the target large model according to the calling request, and obtain the calling result output by the target large model based on the calling request.
[0037] Thirdly, this application provides a computer device including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.
[0038] Fourthly, this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.
[0039] Fifthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-described method.
[0040] The aforementioned large-model invocation method, apparatus, computer equipment, computer-readable storage medium, and computer program product obtain an invocation request through large-model invocation request parameters and a preset invocation template, determine the request type and corresponding resource request information, identify the target large-model based on the resource request information and the resource information corresponding to the request type in each large model, and invoke the target large model according to the invocation request to obtain the invocation result output by the target large model based on the invocation request. Compared to the traditional method of configuring and accessing each large model independently, this solution sets a unified invocation request template, uses the invocation request parameters of the large model to determine the request type and resource request information, and matches the resource request information with the corresponding resource information in each large model to identify and invoke the target large model that meets the conditions, thereby achieving rapid access to large models and improving the efficiency of large-model invocation. Attached Figure Description
[0041] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0042] Figure 1 This is a flowchart illustrating the process of calling a large model method in one embodiment;
[0043] Figure 2 This is a schematic diagram of the structure of a preset call template in one embodiment;
[0044] Figure 3 This is a flowchart illustrating the traffic recording steps in one embodiment;
[0045] Figure 4 This is a block diagram of the structure of a large model calling system in one embodiment;
[0046] Figure 5 This is a structural block diagram of a large model calling device in one embodiment;
[0047] Figure 6 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0049] In one embodiment, such as Figure 1 As shown, a large model invocation method is provided. This embodiment illustrates the application of this method to a server. It is understood that this method can also be applied to a terminal, and can also be applied to a system including a terminal and a server, and is implemented through the interaction between the terminal and the server, including the following steps S202 to S208. Wherein:
[0050] Step S202: Obtain the large model call request parameters, input the large model call request parameters into the preset call template, and obtain the corresponding call request.
[0051] Here, "large models" can refer to large-scale artificial intelligence models, specifically deep learning-based models, which can learn and process massive amounts of data to make predictions and generate results. Enterprises need to integrate various large models and make them available internally. However, with the proliferation of large models, integrating each model requires multiple steps, hindering rapid integration and invocation. Therefore, servers can utilize native or non-native components to abstract from the gateway layer to the program level, optimizing the call chain and improving the efficiency of calling large models.
[0052] When a user needs to invoke a large model, they can log in to the server with their user account and then input the corresponding large model invoke request parameters. These parameters can be the parameters used by the large model for prediction and recognition output, and can be text, images, or a combination of both. The server can also pre-build a preset invoke template for large model invocations. The server can input the user-inputted large model invoke parameters into the preset template, and then obtain the invoke request based on the preset template after inputting the large model invoke parameters.
[0053] Specifically, such as Figure 2 As shown, Figure 2This is a schematic diagram of a preset call template in one embodiment. The server can generate the corresponding preset call template based on the Gozero code. For example, the server generates a pre-compiled empty project and some common code based on the Gozero code, and retains specific methods for input. These include modules such as Doverify, DoCanUse, GetPrice, Core, and PostAfter. Among them, the Doverify module is used to implement verification logic such as parameter validation and security interception; the DoCanUse module is used to implement the logic of whether the user has enabled the use of large model calls; GetPrice is used to obtain the resource information of the called model, such as price and other numerical resource information; Core is used to implement the connection of specific streaming and non-streaming interfaces and resource scheduling; PostAfter is used to implement post-processing, such as updating the resources of the large model. Then, the user can directly input the large model call request parameters to realize the large model call through the template. The server can determine the logic of calling the large model based on the above preset call template, and then can call the large model based on the call request.
[0054] Step S204: Determine the request type and corresponding resource request information based on the call request.
[0055] The aforementioned call requests include various request types, which are determined based on the different types of call request parameters. Resource request information includes the resources required to fulfill the requirements of the call request, such as the model size of the large model to be used, the number of characters to be generated, and the number of images to be generated. The server can determine the request type and the corresponding resource request information for that request type based on the aforementioned call requests. The server can parse the call requests to obtain the desired request type when the user inputs large model call parameters, and then determine the corresponding resource request information for that request type.
[0056] Step S206: Obtain the resource information corresponding to the request type in each of the major models to be called, and determine the target major model based on the resource request information and the resource information.
[0057] The server can pre-connect to multiple large models, which may include models with different resource configurations and user configurations. Resource configurations represent the resources provided by the large model, and user configurations represent the types of users who can use that large model. The server can obtain resource information corresponding to the request type from each of the aforementioned large models to be invoked. This resource information may include multiple types of resource information, allowing the server to obtain the corresponding resource information for each type within each large model based on the request type. Therefore, based on the resource request information and the corresponding resource information of each large model, the server can select the target large model that meets the requirements.
[0058] The large models consume resources when predicting and identifying call requests. The amount of resources consumed varies depending on the difficulty of the call request, and each large model has a preset weight. When the server selects large models, it can determine the selection based on the matching degree between resource request information and resource information type, the computational load of the large model, and the preset weight of the large model.
[0059] Each large model can be accessed by providing an interface. The server can include an open layer, which is an open platform with API (Application Programming Interface) publishing and subscription capabilities. APIs can be applied for and used by users or systems. For example, APIs can be published, subscribed to, and approved in the form of system code. Since different models require different numerical resource information, the server can publish APIs by model version number. For example, a large model of the gpt-3.5 series can be an interface, and a large model of the gpt-4 series can be an interface.
[0060] Step S208: Invoke the target large model according to the call request, and obtain the call result output by the target large model based on the call request.
[0061] The server can invoke the selected target large model based on the aforementioned invocation request. For example, the server can input relevant parameters into the target large model based on the invocation request, and the target large model can then perform corresponding predictions and recognitions based on the input parameters to obtain the corresponding invocation results. Thus, the server can obtain the invocation results output by the target large model based on the invocation request. These invocation results can include various forms, such as at least one of text-based and image-based invocation results from the target large model. Furthermore, in some embodiments, the server can implement the invocation of various large models through highly abstract code, such as using factory patterns, strategy patterns, and template patterns to jointly complete the invocation of large models.
[0062] In the aforementioned large model invocation method, the invocation request is obtained through the large model invocation request parameters and a preset invocation template. The request type and corresponding resource request information are determined. Based on the resource request information and the resource information corresponding to the request type in each large model, the target large model is determined. The target large model is then invoked according to the invocation request, and the invocation result output by the target large model based on the invocation request is obtained. Compared to the traditional method of configuring and integrating each large model independently, this solution sets a unified invocation request template, uses the invocation request parameters of the large model to determine the request type and resource request information, and matches the resource request information with the corresponding resource information in each large model to determine the target large model that meets the conditions and invoke it. This achieves rapid integration of large models and improves the efficiency of large model invocation.
[0063] In an exemplary embodiment, determining the request type and corresponding resource request information based on the call request includes: obtaining the parameter type of the large model call request parameters in the call request; determining the request type of the call request based on the parameter type; and obtaining the computational resource request information and numerical resource request information corresponding to the request type as resource request information.
[0064] In this embodiment, the aforementioned call request can be generated based on the large model call request parameters. The server can then determine the request type based on these parameters. For example, after obtaining the parameter types of the large model call request parameters, the server determines the request type based on these parameter types. For instance, the server can detect whether the large model call parameters are text or image types, and based on the request type, obtain the corresponding computational resource request information and numerical resource request information as the resource request information required for the call request. The computational resource information represents the computational resources consumed by the large model when predicting the call request, such as the call frequency; the numerical resource information represents the numerical resources required to call the large model, which can be deducted from the user's account. That is, the user's account pre-contains a corresponding amount of numerical resources.
[0065] The parameter types mentioned above include text and images. For different types of large model call request parameters, the numerical and computational resources required for prediction and processing by the large model may differ. Therefore, the server can determine the corresponding computational and numerical resource request information based on the different types of large model call request parameters.
[0066] In one embodiment, determining the request type of the call request based on the parameter type and obtaining the computing resource request information and numerical resource request information corresponding to the request type as resource request information includes: if the parameter type is text, then determining the request type as a first type and obtaining the first computing resource request information and the first numerical resource request information corresponding to the first type; if the parameter type is image, then determining the request type as a second type and obtaining the second computing resource request information and the second numerical resource request information corresponding to the second type.
[0067] In this embodiment, the server can detect the parameter type of the large model call request parameters. If the server detects that the parameter type is text, the server can determine that the request type is the first type, where the first type represents the text type. The server can obtain the first computing resource request information and the first numerical resource request information corresponding to the first type, thereby obtaining the first resource request information corresponding to the first type.
[0068] If the server detects that the parameter type is an image, it can determine that the request type is the second type, where the second type represents the image type. The server can then obtain the second computing resource request information and the second numerical resource request information corresponding to the second type, thereby obtaining the second resource request information corresponding to the second type. The number of resources requested by the first resource request information and the second resource request information may be different.
[0069] After the server determines the resource request information corresponding to the call request, it can compare it with the resource information corresponding to each of the various large models to determine the target large model. Each large model has different resource limits for different types of requests; therefore, for different types of requests, the server obtains the corresponding resource information from the preset configuration model.
[0070] In one embodiment, obtaining the resource information corresponding to the request type in each of the major models to be invoked includes: for each major model, if the request type is a first type, obtaining the first computing resource information and the first numerical resource information from the preset configuration model; if the request type is a second type, obtaining the second computing resource information and the second numerical resource information from the preset configuration model.
[0071] In this embodiment, there are multiple large models. The preset configuration model includes resource information of different types corresponding to each large model, as well as the quota and name of each type of resource information. The first type can be a text type, and the second type can be an image type. For each large model, when the server detects that the request type is the first type, it can obtain the first computing resource information and the first numerical resource information corresponding to that large model from the preset configuration model. The first computing resource information represents the quota of computing resources for that large model when processing a call request of the first type, and the first numerical resource information represents the numerical resource requirements for that large model when processing a call request of the first type.
[0072] For each large model, when the server detects that the request type is type 2, it retrieves the corresponding second computing resource information and second numerical resource information from the preset configuration model. The second computing resource information represents the amount of computing resources available to the large model when processing type 2 call requests, and the second numerical resource information represents the numerical resource requirements of the large model when processing type 2 call requests.
[0073] Specifically, the server can acquire various computing resource and numerical resource information corresponding to the aforementioned large model by pre-constructing a preset configuration model. This preset configuration model can include a numerical resource configuration model, a computing resource configuration model, and a user configuration model. The numerical resource configuration model records unit numerical resource information for two types of data: first, text generated from text; and second, text generated from images. This is to address scenarios where numerical resource calculations are performed based on call frequency or the number of images. The computing resource configuration includes the allocation of computing resources corresponding to the large model's prediction, including available and currently occupied resource information. The server prioritizes specific models based on weights. The user configuration model can also manage user permissions to access a product's large model and manage used computing resource information, such as the allocated quota and the number of images.
[0074] The above numerical resource configuration can specifically include: {product_name (product name); model_name (model name); model_version (model version number); type (text-to-text, image-to-text); input_price (input unit numerical resource information); out_price (output unit numerical resource information); is_delete (whether it is effective)}. The computational resource configuration can specifically include: {_api_key (key in the area); model_type (model type); resource_name (resource name); deployment_id (source deployment identifier); model_version (model version number); api_version (api version number); rpm (requests per minute); tpm (resources consumed per minute); current_rpm (currently used calls); current_tpm (currently used resources); weight_tokens (number of used digital resources); is_delete (whether it is effective)}. The user-configured model can include: {system_code (system code); model (model name); used_token (used computational resources); used_image (used images); is_delete (whether it is effective)}.
[0075] Through the above embodiments, the server can identify the parameter type of the large model call request parameters in the call request, and obtain the corresponding type of computing resource information and numerical resource information from the corresponding position in the preset configuration model according to the different parameter types. Then, the server can process the corresponding call request based on the corresponding type of computing resource information and numerical resource information, so that users can quickly call the large model and improve the efficiency of large model access and call.
[0076] In one embodiment, determining the target large model based on resource request information and various resource information includes: for each large model, obtaining a first comparison result between the current number of requests corresponding to the large model and a preset request frequency threshold; determining the resource call frequency based on the currently occupied resource information and resource request information in the resource information, and obtaining a second comparison result between the resource call frequency and a preset resource call frequency threshold; determining the model score corresponding to the large model based on the first comparison result, the second comparison result, and the preset weight corresponding to the large model; and determining the target large model based on the model scores corresponding to each large model.
[0077] In this embodiment, the server can comprehensively judge resource request information and resource information to determine the target large model. There can be multiple large models associated with the server. For each large model, the server can perform a first comparison between the current number of requests for that large model and a preset request frequency to obtain a first comparison result. The current number of requests represents the number of times the large model has been requested and invoked, and the preset request frequency represents the maximum number of times the large model is allowed to be invoked within a preset time period. The preset request frequency is associated with the large model, and the first comparison result represents the comparison between the current number of requests for the large model and the preset request frequency.
[0078] The server can also obtain the currently occupied resource information from the aforementioned resource information, and determine the resource call frequency based on the currently occupied resource information and the resource request information. For example, the server can add the currently occupied resource information to the resource request information to obtain the current resource call frequency. The server can then perform a second comparison between the corresponding resource call frequency of the large model and a preset resource call frequency threshold to obtain a second comparison result. Here, the currently occupied resource information represents the resource information currently occupied in the large model; the resource call frequency represents the sum of the currently occupied resource information and the resource request information required by the call request; and the preset resource call frequency threshold is a value pre-set in the computational resource configuration model, representing the maximum value of resources allowed to be used by the large model within a preset time period.
[0079] The aforementioned large models have preset weights. Based on the first comparison result, the second comparison result, and the preset weights, the server can determine the model score corresponding to each of these large models. Therefore, the server can determine the target large model based on the model scores of each large model. For example, the server can select the large model with the highest model score as the target large model.
[0080] Specifically, the server can perform parameter validation and resource request information estimation at the input management level, allowing it to select the large model corresponding to which there are sufficient resources available at the current time. By obtaining resource request information in advance, it can help product or R&D teams implement specific functions. The server can also use the system's security interface to intercept abnormal call requests. The server can perform resource load balancing control on each large model. For example, for each call request, the server needs to pre-calculate the token (pre_token) in the current request before the call, i.e., the resource request information, and then find the large models that satisfy this formula in the configuration table. This formula can be specifically expressed as (current_rpm + 1 <= rpm) and (current_token + pre_token <= tpm). Here, current_rpm represents the number of large models currently requested, current_token represents the resource information currently occupied by the large model, and pre_token represents the resource request information. The server can then combine this with preset weights to filter large model resources, obtain the target large model, and then use the call request to initiate a call to the target large model. In this process, the server program needs to ensure proper data refresh, and due to resource contention, the server can control the use of locks or transactions.
[0081] In this embodiment, the server can use a weighted selection process based on the current resource usage, resource request information, and model request frequency to select the target large model for invocation, thereby improving the efficiency of invoking large models.
[0082] In one embodiment, invoking a target large model according to a call request and obtaining the call result output by the target large model based on the call request includes: obtaining the requesting user information corresponding to the call request and obtaining the model identifier corresponding to the target large model; generating a request structure based on the call request, the requesting user information, and the model identifier; inputting the request structure into the target large model according to the call interface corresponding to the target large model, and having the target large model output the corresponding return parameters based on the request structure; and generating the call result corresponding to the call request based on the return parameters, resource request information, and the model identifier of the target large model.
[0083] In this embodiment, the server can invoke the target large model based on a structure. The server can obtain the requesting user information corresponding to the invocation request, which may be the information of the user initiating the invocation request. The server can also obtain the model identifier corresponding to the target large model, thus enabling the server to generate a request structure based on the invocation request, the requesting user information, and the model identifier.
[0084] The aforementioned target large model can be invoked through a corresponding API. The server can input a request structure into the target large model using the API, and the target large model will output corresponding return parameters based on the request structure. The server can obtain the returned parameters and, based on the returned parameters, resource request information, and the target large model's model identifier, generate the invocation result corresponding to the invocation request.
[0085] Specifically, the server exposes its interface to internal departments and systems. To maintain the interface's native nature, the server only performs necessary validations and collects user information; all other parameters are transparently transmitted. For example, the call is implemented based on the input of a large model using a request structure. The request structure can be represented as: { appCode:"test123","userCode:"123456","systemCode:"INC-NLP-LLM","modelType:"gemini-1.0-pro","input":{ / / native parameters}}. Here, appCode represents the application identifier, userCode represents the requesting user information, systemCode represents the system identifier, modelType represents the requested interface, and native parameters represent the parameters for the large model call request. After the server inputs the above structure into the corresponding target large model, it can generate the call result based on the return parameters of the target large model. The call result can be a response structure, specifically represented as: {"data":{"costToken"}} :17,"msgId":344077,"billId":344077,"supplyType":1,"keyId":616,"productName":"openai","response":{ / / native return}},"status":"SUCCESS","message":"","code":0}. Here, costToken represents the resource information used by the target large model to execute the call request, msgId represents the identifier of the call result, billId represents the bill identifier of the numerical resource information required by the target large model, productName represents the type of large model used, the native return represents the return parameters output by the target large model, and status represents the success or failure status of the call.
[0086] Through this embodiment, the server can invoke the target large model and generate the invocation result based on the invocation request, thereby improving the efficiency of invoking the target large model.
[0087] In one embodiment, after calling the target large model according to the call request and obtaining the call result output by the target large model based on the call request, the method further includes: recording the call result through the Kafka middleware in the gateway layer to obtain the recorded call result; and pushing the recorded call result to the management end.
[0088] In this embodiment, the server can also record the output results. The server includes a gateway layer containing Kafka middleware. The server can use the Kafka middleware in the gateway layer to record the call results, obtain the recorded call results, and push the recorded call results to the management terminal. The management terminal can be a device used to manage various large models. The management terminal can detect anomalies in each call based on the call results and handle any anomalies promptly.
[0089] Specifically, such as Figure 3 As shown, Figure 3 This is a flowchart illustrating the traffic recording steps in one embodiment. The gateway layer mentioned above may include gateways, etc. The server inputs the aforementioned request structure into the corresponding target model through an interface. When the target model outputs the corresponding return parameters, the server can use existing or custom plugins from the gateway to implement traffic recording, request rate limiting, and request authentication. Traffic recording, combined with real-time computing tasks, completes the transfer of call details. The call details include relevant information for the entire request call chain, including all processes from the input of the request call structure to the generation of the call result.
[0090] Through this embodiment, the server can record the traffic of the call process based on Kafka middleware. By verifying the recorded traffic, it can detect whether there are any abnormalities in the call process and improve the security of calling large models.
[0091] In one exemplary embodiment, such as Figure 4 As shown, Figure 4 This is a structural block diagram of a large model invocation system in one embodiment. In this embodiment, the server has multiple layers, including an open layer, a gateway layer, an interface layer, a core layer, a model capability layer, a middleware layer, a proxy layer, an infrastructure layer, and service monitoring and alarms.
[0092] The open layer provides interfaces for various large model series. Servers can implement functions such as traffic recording, authentication, interface rate limiting, and custom plugins at the gateway layer. At the interface layer, servers can recharge user accounts for resources and generate and transmit request structures and call results using account information, algorithm embedding vectors, and chat text. In the core layer, servers can perform input management, configuration management, and detail management. Input management includes validating call parameters, estimating resource request information, security interception, and load balancing. Servers can pre-configure multiple configuration models to store information from large model calls. Furthermore, during real-time computation tasks, servers can obtain configured numerical resource information from request models, parse it, and populate the request details, which are then fed into the data lake or data warehouse.
[0093] The model capability layer includes various types of large models. These include models that require numerical resource information, as well as various open-source large models that can be invoked without it. The middleware layer can contain multiple middleware components to meet the requirements for calling large models. These include: a MySQL database for storing configuration data and information with special requirements; Redis for caching necessary data to reduce program-database interaction; Nacos for centralized management and distribution of application configuration information; Saturn for managing and executing various tasks; Kafka for asynchronous communication between applications; and Oss for storing media files.
[0094] The proxy layer in the server can enable proxy access to large model capabilities across multiple regions. For example, a first-class large model can be accessed via a first-cloud proxy, and a second-class large model can be accessed via a second-cloud proxy. The first-class and second-class large models are located on different servers.
[0095] The infrastructure layer can include functions such as code repositories, pipelines, dependency management, elastic scaling, and vulnerability detection. In service monitoring and alerting, servers can use business intelligence tools combined with data from the data warehouse to perform cost statistics and risk control alerts. Real-time push notifications can be achieved using push tools, and APM (Application Performance Management) can be used to monitor anomalies in various services. Alert messages are promptly sent to the management console when anomalies occur.
[0096] Through the above embodiments, by setting a unified call request template and utilizing the call request parameters for large models, the request type and resource request information are determined. By matching the resource request information with the corresponding resource information in each large model, the target large model that meets the conditions is identified and called, achieving rapid access to large models and improving the efficiency of large model calls. Furthermore, through these embodiments, there is no need to focus on rate limiting, detailed database entry, numerical resource statistics, and alarms at the service level; only the integration with model capabilities needs to be addressed. Resource usage can be balanced directly through the template, thereby minimizing the cost of model integration and greatly improving development efficiency.
[0097] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0098] Based on the same inventive concept, this application also provides a large model invocation apparatus for implementing the large model invocation method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations in one or more large model invocation apparatus embodiments provided below can be found in the limitations of the large model invocation method described above, and will not be repeated here.
[0099] In one exemplary embodiment, such as Figure 5 As shown, a large model calling device is provided, including: an acquisition module 500, a first determination module 502, a second determination module 504, and a calling module 506, wherein:
[0100] The acquisition module 500 is used to acquire the large model call request parameters. The large model call request parameters are input into the preset call template to obtain the corresponding call request.
[0101] The first determining module 502 is used to determine the request type and the corresponding resource request information based on the call request.
[0102] The second determining module 504 is used to obtain the resource information corresponding to the request type in each of the various large models to be called, and to determine the target large model based on the resource request information and the resource information.
[0103] Module 506 is used to invoke the target large model according to the invocation request and obtain the invocation result output by the target large model based on the invocation request.
[0104] In one embodiment, the first determining module 502 is used to obtain the parameter type of the large model call request parameter in the call request; determine the request type of the call request based on the parameter type; and obtain the computing resource request information and numerical resource request information corresponding to the request type as resource request information.
[0105] In one embodiment, the first determining module 502 is configured to determine the request type as a first type if the parameter type is text, and obtain the first computing resource request information and the first numerical resource request information corresponding to the first type; if the parameter type is image, determine the request type as a second type, and obtain the second computing resource request information and the second numerical resource request information corresponding to the second type.
[0106] In one embodiment, the second determining module 504 is configured to, for each large model, if the request type is a first type, obtain first computing resource information and first numerical resource information from a preset configuration model; if the request type is a second type, obtain second computing resource information and second numerical resource information from a preset configuration model.
[0107] In one embodiment, the second determining module 504 is configured to, for each large model, obtain a first comparison result between the current number of requests corresponding to the large model and a preset request frequency threshold; determine the resource call frequency based on the currently occupied resource information and resource request information in the resource information; obtain a second comparison result between the resource call frequency and a preset resource call frequency threshold; determine the model score corresponding to the large model based on the first comparison result, the second comparison result, and the preset weight corresponding to the large model; and determine the target large model based on the model scores corresponding to each large model.
[0108] In one embodiment, the aforementioned calling module 506 is used to obtain the requesting user information corresponding to the calling request and the model identifier corresponding to the target large model; generate a request structure based on the calling request, the requesting user information, and the model identifier; input the request structure into the target large model according to the calling interface corresponding to the target large model, and have the target large model output the corresponding return parameters based on the request structure; and generate the calling result corresponding to the calling request based on the return parameters, resource request information, and the model identifier of the target large model.
[0109] In one embodiment, the above apparatus further includes: a recording module, configured to record the call results through the Kafka middleware in the gateway layer to obtain the recorded call results; and push the recorded call results to the management terminal.
[0110] Each module in the aforementioned large-scale model calling device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0111] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 6 As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores large model data. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements a large model invocation method.
[0112] Those skilled in the art will understand that Figure 6 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0113] In one exemplary embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the large model invocation method described above.
[0114] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the large model invocation method described above.
[0115] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the large model invocation method described above.
[0116] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0117] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.
[0118] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.
[0119] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A method for calling a large model, characterized in that, The method includes: Obtain the large model call request parameters, input the large model call request parameters into a preset call template, and obtain the corresponding call request; Based on the call request, determine the request type and the corresponding resource request information; Obtain the resource information corresponding to the request type from each of the major models to be invoked, and determine the target major model based on the resource request information and each of the resource information; The target large model is invoked according to the invocation request, and the invocation result output by the target large model based on the invocation request is obtained.
2. The method according to claim 1, characterized in that, The step of determining the request type and corresponding resource request information based on the call request includes: Obtain the parameter type of the large model call request parameter in the call request; The request type of the call request is determined based on the parameter type, and the computing resource request information and numerical resource request information corresponding to the request type are obtained as the resource request information.
3. The method according to claim 2, characterized in that, The step of determining the request type of the call request based on the parameter type, and obtaining the computing resource request information and numerical resource request information corresponding to the request type as the resource request information includes: If the parameter type is text, then the request type is determined to be the first type, and the first computing resource request information and the first numerical resource request information corresponding to the first type are obtained; If the parameter type is an image, then the request type is determined to be the second type, and the second computing resource request information and the second numerical resource request information corresponding to the second type are obtained.
4. The method according to claim 3, characterized in that, The step of obtaining the resource information corresponding to the request type in each of the major models to be invoked includes: For each of the large models, if the request type is the first type, obtain the first computing resource information and the first numerical resource information from the preset configuration model; If the request type is the second type, obtain the second computing resource information and the second numerical resource information from the preset configuration model.
5. The method according to claim 1, characterized in that, The step of determining the target large model based on the resource request information and each of the resource information includes: For each of the large models, obtain the first comparison result between the current number of requests corresponding to the large model and the preset request frequency threshold; The resource call frequency is determined based on the currently occupied resource information and the resource request information in the resource information, and a second comparison result between the resource call frequency and the preset resource call frequency threshold is obtained; Based on the first comparison result, the second comparison result, and the preset weights corresponding to the large model, the model score corresponding to the large model is determined; The target large model is determined based on the scores of each model corresponding to each large model.
6. The method according to claim 1, characterized in that, The step of invoking the target large model according to the invocation request and obtaining the invocation result output by the target large model based on the invocation request includes: Obtain the requesting user information corresponding to the call request, and obtain the model identifier corresponding to the target large model; Generate a request structure based on the invocation request, the requesting user information, and the model identifier; According to the calling interface corresponding to the target large model, the request structure is input into the target large model, and the target large model outputs the corresponding return parameters based on the request structure; Based on the returned parameters, the resource request information, and the model identifier of the target large model, the call result corresponding to the call request is generated.
7. The method according to any one of claims 1 to 6, characterized in that, After invoking the target large model according to the invocation request and obtaining the invocation result output by the target large model based on the invocation request, the method further includes: The call results are recorded by the Kafka middleware in the gateway layer to obtain the recorded call results. The recorded call results are pushed to the management terminal.
8. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.
9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.