Agent platform-based evaluation task processing method and device, and equipment
By determining the position of subtasks in the evaluation queue of the intelligent agent platform based on evaluation rules, dialogue rounds, and evaluation objects, the problem of unfair task execution in multi-tenant architecture is solved, and fair collaboration and efficient scheduling of subtasks are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING VOLCANO ENGINE TECH CO LTD
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240277A_ABST
Abstract
Description
Technical Field
[0001] This article relates to one or more scenarios involving an evaluation task processing method, an evaluation task processing device, and an electronic device based on an intelligent agent platform. Background Technology
[0002] With the continuous development of artificial intelligence technology, intelligent agent platforms have emerged. For example, an intelligent agent platform can use multiple intelligent agents to collaboratively evaluate evaluation objects related to large models.
[0003] The aforementioned intelligent agent platforms typically adopt a multi-tenant architecture, meaning that different tenants can simultaneously execute evaluation tasks on the intelligent agent platform, and an evaluation task can also include multiple dialogue groups.
[0004] Therefore, for intelligent agent platforms, it is particularly important to improve the fairness among multiple dialogue groups in an evaluation task and the fairness among different tenants during the execution of evaluation tasks. Summary of the Invention
[0005] This summary section is provided to briefly introduce the concepts, which will be described in detail in the detailed description section below. This summary section is not intended to identify key or essential features of the claimed technical solution, nor is it intended to limit the scope of the claimed technical solution.
[0006] This paper provides at least one scenario of an evaluation task processing method based on an intelligent agent platform, wherein the intelligent agent platform is used to evaluate evaluation objects related to a large model. The method includes: obtaining a first evaluation task, wherein the first evaluation task indicates: evaluating at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue; for each of the multiple dialogue groups, performing the following steps: obtaining a virtual end time of a first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds of the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is a subtask corresponding to the dialogue group; adding the first subtask to the evaluation queue based on the virtual end time of the first subtask, wherein the virtual end time of each subtask in the evaluation queue gradually increases; in response to the first subtask being at the head of the evaluation queue, executing the first subtask based on the dialogue group to obtain the evaluation result of the first subtask.
[0007] This paper provides at least one scenario of an evaluation task processing device based on an intelligent agent platform, wherein the intelligent agent platform is used to evaluate evaluation objects related to a large model. The device includes: an acquisition module configured to: acquire a first evaluation task, wherein the first evaluation task indicates: evaluating at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue; and an execution module configured to: for each of the multiple dialogue groups, perform the following steps: acquiring a virtual end time of a first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds of the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is a subtask corresponding to the dialogue group; adding the first subtask to the evaluation queue based on the virtual end time of the first subtask, wherein the virtual end time of each subtask in the evaluation queue gradually increases; and executing the first subtask based on the dialogue group in response to the first subtask being at the head of the evaluation queue, thereby obtaining the evaluation result of the first subtask.
[0008] At least one aspect of this document provides an electronic device, including: at least one processor; and at least one memory, including one or more computer program instructions; wherein the one or more computer program instructions are executed by the processor to perform the evaluation task processing method based on an intelligent agent platform provided in at least one aspect of this document.
[0009] At least one aspect of this document provides a computer-readable storage medium that non-transitory stores computer-readable instructions, wherein the computer-readable instructions, when executed by a processor, implement the evaluation task processing method based on an intelligent agent platform provided by at least one aspect of this document.
[0010] At least one aspect of this document provides a computer program product, including a computer program that, when executed by a processor, implements the evaluation task processing method based on an intelligent agent platform provided by at least one aspect of this document.
[0011] In the evaluation task processing method based on an intelligent agent platform provided in at least one scenario of this paper, in the scenario where the intelligent agent platform of multi-tenant architecture evaluates evaluation objects related to large models, the virtual time of the evaluation queue is used to uniformly manage different evaluation tasks created by different tenants in units of subtasks. The position of the subtask in the evaluation queue is jointly determined by the evaluation rules, evaluation objects, and the number of dialogue rounds. The shorter the execution time of the subtask, the higher the position in the evaluation queue. Since the evaluation rules and evaluation objects are strongly correlated with the evaluation task (i.e., the tenant), and the number of dialogue rounds is strongly correlated with the dialogue group (i.e., the subtask), the various factors affecting the execution of the subtask are fully considered from the two levels of tenant and subtask. This improves the fairness of different evaluation tasks of different tenants and different subtasks in a single evaluation task in terms of task scheduling and task execution, realizes two-level fair collaboration, and effectively reduces the waiting time of subtasks with short execution times. Attached Figure Description
[0012] The above and other features, advantages, and aspects of the various scenarios described herein will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.
[0013] Figure 1 The illustration shows an application scenario diagram of an evaluation task processing method based on an intelligent agent platform provided in at least one of the cases described in this paper.
[0014] Figure 2 The illustration shows a schematic diagram of the architecture of an intelligent agent platform provided in at least one scenario of this paper;
[0015] Figure 3 The diagram illustrates a flowchart of an evaluation task processing method based on an intelligent agent platform, provided in at least one scenario of this paper.
[0016] Figure 4 The diagram illustrates a flowchart of an evaluation task processing method based on an intelligent agent platform, provided in at least one scenario of this paper.
[0017] Figure 5 The schematic diagram illustrates a flowchart of a subtask retry mechanism provided in at least one scenario of this paper;
[0018] Figure 6 The schematic diagram illustrates the structure of an evaluation task processing device based on an intelligent agent platform, as provided in at least one scenario of this paper; and
[0019] Figure 7 A schematic diagram of the structure of an electronic device suitable for implementing at least one of the situations described herein is shown. Detailed Implementation
[0020] One or more scenarios described herein will now be described in more detail with reference to the accompanying drawings. While some scenarios are shown in the drawings, it should be understood that this document can be implemented in various forms and should not be construed as limited to the scenarios set forth herein; rather, these scenarios are provided to provide a more thorough and complete understanding of this document. It should be understood that the accompanying drawings and scenarios are for illustrative purposes only and are not intended to limit the scope of this document.
[0021] It should be understood that the steps described in the method embodiments herein may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this document is not limited in this respect.
[0022] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one situation" means "at least one situation"; the term "another situation" means "at least one additional situation"; the term "some situations" means "at least some situations". Definitions of other terms will be given in the following description.
[0023] It should be noted that the concepts of "first" and "second" mentioned in this article are only used to distinguish different devices, modules or units, and are not used to limit the order of the functions performed by these devices, modules or units or their interdependencies.
[0024] It should be noted that the terms "one" and "more" used in this document are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0025] The names of the messages or information exchanged between the various devices in the embodiments herein are for illustrative purposes only and are not intended to limit the scope of these messages or information.
[0026] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition, use, storage or deletion of the data) shall comply with the requirements of relevant laws, regulations and related provisions.
[0027] It is understood that before using the technical solutions disclosed in each scenario in this article, relevant users should be informed of the type, scope of use, and usage scenarios of the information involved in this article and their authorization should be obtained through appropriate means in accordance with relevant laws and regulations. Relevant users may include any type of rights holder, such as individuals, enterprises, or groups.
[0028] For example, in response to receiving an active request from a user, a prompt message is sent to the relevant user to clearly inform the user that the requested operation will require obtaining and using the user's information, thereby enabling the relevant user to choose whether to provide information to the software or hardware such as electronic devices, applications, servers, or storage media that perform the operation of any of the technical solutions described herein.
[0029] As an optional but non-restrictive implementation, in response to a user's active request, a prompt message can be sent to the user, such as a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide information to the electronic device.
[0030] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation method described in this article. Other methods that comply with relevant laws and regulations may also be applied to the implementation method described in this article.
[0031] An intelligent agent platform can be understood as a software middleware system that provides infrastructure for intelligent agent lifecycle management and interaction, enabling dynamic task scheduling and elastic resource allocation among multiple intelligent agents.
[0032] Intelligent agent platforms can be applied in different scenarios. For example, intelligent agent platforms can be used to evaluate evaluation objects related to large models. That is, intelligent agent platforms can provide evaluation services related to large models as large model evaluation platforms. For example, intelligent agent platforms can test the reasoning ability, reasoning performance, security and robustness of evaluation objects related to large models, and provide decision-making basis for the research, development, selection and optimization of evaluation objects related to large models.
[0033] The intelligent agent platform can adopt a multi-tenant architecture, that is, in the same physical or virtualized platform instance, through the isolation mechanism, multiple independent tenants (such as organizations or teams) are allowed to use the intelligent agent platform at the same time, and the data between multiple tenants is not visible, resources do not conflict, and configurations do not interfere with each other.
[0034] For multi-tenant architecture intelligent agent platforms, during the execution of evaluation tasks, task scheduling and execution are usually carried out in units of subtasks. For example, the evaluation set of an evaluation task may include multiple dialogue groups, and one dialogue group corresponds to one subtask. That is, an evaluation task may include multiple subtasks. The intelligent agent platform can schedule different subtasks in different evaluation tasks and interleave different subtasks in different evaluation tasks created by different tenants.
[0035] However, the above approach has at least two problems: First, for different evaluation tasks created by different tenants, the execution time of different evaluation tasks will be different due to the different evaluation rules and evaluation objects. In addition, for the same evaluation task, the execution time of different subtasks in an evaluation task will also be different due to the different number of dialogue rounds in different dialogue groups. Therefore, the intelligent agent platform is difficult to achieve cross-tenant task execution fairness and intra-task fairness when multiple tenants share resources. It is easy for subtasks with short execution times to have to wait for a long time, resulting in unfair task scheduling phenomena such as long-tail blocking.
[0036] To at least partially solve the aforementioned technical problem, this paper provides a method for processing evaluation tasks based on an intelligent agent platform. This platform is used to evaluate evaluation objects related to a large model. The method includes: obtaining a first evaluation task, which indicates that at least one evaluation object is evaluated using a first evaluation set under at least one evaluation rule. The first evaluation set includes multiple dialogue groups, each including at least one round of dialogue. For each dialogue group, the following steps are performed: obtaining the virtual end time of a first subtask based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue. The first subtask is the subtask corresponding to the dialogue group. Based on the virtual end time of the first subtask, the first subtask is added to the evaluation queue. The virtual end times of each subtask in the evaluation queue gradually increase. In response to the first subtask being at the head of the evaluation queue, the first subtask is executed based on the dialogue group to obtain the evaluation result of the first subtask.
[0037] Based on the evaluation task processing method based on an intelligent agent platform provided in at least one of the embodiments described herein, an evaluation task processing apparatus, electronic device, computer-readable storage medium, and computer program product based on an intelligent agent platform are also provided in at least one of the embodiments described herein.
[0038] In the evaluation task processing method based on an intelligent agent platform provided in at least one scenario of this paper, in the scenario where the intelligent agent platform of multi-tenant architecture evaluates evaluation objects related to large models, the virtual time of the evaluation queue is used to uniformly manage different evaluation tasks created by different tenants in units of subtasks. The position of the subtask in the evaluation queue is jointly determined by the evaluation rules, evaluation objects, and the number of dialogue rounds. The shorter the execution time of the subtask, the higher the position in the evaluation queue. Since the evaluation rules and evaluation objects are strongly correlated with the evaluation task (i.e., the tenant), and the number of dialogue rounds is strongly correlated with the dialogue group (i.e., the subtask), the various factors affecting the execution of the subtask are fully considered from the two levels of tenant and subtask. This improves the fairness of different evaluation tasks of different tenants and different subtasks in a single evaluation task in terms of task scheduling and task execution, realizes two-level fair collaboration, and effectively reduces the waiting time of subtasks with short execution times.
[0039] The following detailed description, with reference to the accompanying drawings, illustrates one or more scenarios and some examples thereof.
[0040] Figure 1 The illustration shows an application scenario diagram of an evaluation task processing method based on an intelligent agent platform, provided in at least one of the cases described in this paper.
[0041] like Figure 1 As shown, the application scenario provided in this case may include user 101, terminal device 102, and intelligent agent platform 103. Terminal device 102 can be various electronic devices capable of providing interactive pages, such as smart wearable devices, smart appliances, smart cars, mobile phones, tablets, laptops, or desktop computers.
[0042] For example, a client may be installed in the terminal device 102. This client may be a client of the intelligent agent platform. The intelligent agent platform 103 may be a server that provides support for the operation of the client installed in the terminal device 102. That is, the intelligent agent platform 103 may be a server that evaluates the evaluation objects related to the large model. For example, the intelligent agent platform 103 may be a server for a local area network or a wide area network, or it may be a cloud server, etc. One or more of these scenarios are not limited in this document.
[0043] User 101 can be a user of a client installed on terminal device 102. For example, user 101 can be a user who uses intelligent agent platform 103 for evaluation.
[0044] The intelligent agent platform 103 can communicate with the terminal device 102, for example, by providing the client installed on the terminal device 102 with relevant data (such as page display content) required by the terminal device to run the client; or, for example, the intelligent agent platform 103 can also receive relevant data (such as data related to the evaluation task) returned by the terminal device 102 during the running of the client.
[0045] For example, the evaluation task processing methods based on intelligent agent platforms provided in one or more scenarios of this paper can be implemented in software, hardware, firmware, or any combination thereof.
[0046] For example, the evaluation task processing method based on the intelligent agent platform provided in one or more scenarios of this paper is applicable to the intelligent agent platform 103, which can load and execute the evaluation task processing method based on the intelligent agent platform. For example, the intelligent agent platform 103 may include a central processing unit (CPU) or graphics processing unit (GPU), digital signal processor (DSP), neural network processing unit (NPU), or other forms of processing units with data processing capabilities and / or instruction execution capabilities, storage units, etc. The intelligent agent platform 103 is also equipped with an operating system and various types of application programming interfaces (APIs), and implements the evaluation task processing method based on the intelligent agent platform provided in one or more scenarios of this paper by running code or instructions.
[0047] The overall architecture of the intelligent agent platform is described below.
[0048] Figure 2 The diagram illustrates the architecture of an intelligent agent platform provided in at least one of the scenarios described herein.
[0049] like Figure 2 As shown, the intelligent agent platform may include a snapshot unit, a dispatch unit, a queue unit, an evaluation unit, and a reporting unit. Each of these units can be implemented by one or more intelligent agents.
[0050] In one or more scenarios described herein, the snapshot unit can be used to receive the evaluation task and take a snapshot of the information related to the evaluation task; the dispatch unit can be used to detect each subtask in the evaluation task based on the amount of available resources; the queue unit can be connected to a database, which can be used to store the evaluation queue; the queue unit can be used to obtain the virtual end time of each subtask in the evaluation task and add each subtask in the evaluation task to the evaluation queue; the evaluation unit can be used to execute one or more subtasks in the evaluation queue in sequence; the evaluation unit can be connected to a token bucket, which can use the tokens generated by the token bucket to execute each subtask; and the reporting unit can be used to integrate the evaluation results of each subtask into the evaluation result of the evaluation task after all subtasks of the evaluation task have been executed.
[0051] In one or more scenarios described herein, the snapshot unit, dispatch unit, queue unit, evaluation unit, and reporting unit can communicate via message queues. For example, the snapshot unit can subscribe to a snapshot topic in the message queue. When a user in a tenant creates an evaluation task, the snapshot topic produces messages (e.g., including an evaluation task identifier). The snapshot unit consumes messages from the snapshot topic, takes a snapshot of the evaluation task, and then produces messages (e.g., including the evaluation task identifier) in the dispatch topic after the snapshot is complete. The dispatch unit can subscribe to a dispatch topic in the message queue, consume messages from the dispatch topic, check each subtask in the evaluation task based on the available resources, and then produce messages (e.g., including the identifiers of the subtasks that passed the check) in the queue topic. The queue unit can subscribe to a queue topic in the message queue, and the queue unit can... The evaluation unit consumes messages from the evaluation topic, obtains the virtual end time of each subtask in the evaluation task, adds each subtask to the evaluation queue, and then produces messages in the evaluation topic (e.g., including the identifier of the subtask currently at the top of the evaluation queue). The evaluation unit can subscribe to the evaluation topic in the message queue, consume messages from the evaluation topic, execute subtasks using tokens generated by the token bucket, and then produce messages in the report topic (e.g., including the identifier of the currently completed subtask). The report unit can subscribe to the report topic in the message queue, consume messages from the report topic, and after all subtasks of the evaluation task have been executed, integrate the evaluation results of each subtask into the evaluation result of the evaluation task.
[0052] The following will combine Figures 3 to 5 This paper provides a detailed description of an evaluation task processing method based on an intelligent agent platform for at least one scenario.
[0053] Figure 3 The diagram illustrates a flowchart of an evaluation task processing method based on an intelligent agent platform, provided in at least one of the scenarios described herein.
[0054] like Figure 3 As shown, the evaluation task processing method based on the intelligent agent platform in this scenario includes steps S301 to S302. In some cases, the executing entity of this evaluation task processing method based on the intelligent agent platform can be the intelligent agent platform itself. This intelligent agent platform can be used to evaluate evaluation objects related to large models. The steps included in this evaluation task processing method based on the intelligent agent platform are described below:
[0055] Step S301: Obtain the first evaluation task.
[0056] The first evaluation task can be understood as any evaluation task that evaluates the evaluation objects related to the large model. That is, the first evaluation task can be an evaluation task created by any user under any tenant of the intelligent agent platform. For example, the first evaluation task can be sent by the first sender, and the first sender can belong to the first tenant.
[0057] The first evaluation task may include a variety of different evaluation information. For example, the first evaluation task may instruct the evaluation of at least one evaluation object under at least one evaluation rule using a first evaluation set.
[0058] The first evaluation set can be understood as the evaluation data used for evaluation. For example, the first evaluation data may include multiple conversation cases, each of which may include at least one round of dialogue, and the at least one round of dialogue may include input information and reference response information.
[0059] For example, a dialogue group can consist of only one round of dialogue, where the input information could be "What are the attractions in city A?" and the reference response information could be "City A has attractions A, B, and C". A dialogue group can also consist of multiple rounds of dialogue, where the input information for the first round of dialogue could be "1+2=" and the reference response information could be "3", the input information for the second round of dialogue could be "+5" and the reference response information could be "8", the input information for the third round of dialogue could be "+8" and the reference response information could be "16".
[0060] At least one evaluation rule can be understood as an evaluation criterion used to constrain evaluation objects related to a large model. For example, at least one evaluation rule may include one or more of the following: prompt evaluation rule, natural language processing (NLP) evaluation rule, retrieval-augmented generation (RAG) evaluation rule, and code capability evaluation rule.
[0061] For example, cue word evaluation rules can be used to evaluate the ability of evaluation objects related to large models to understand and follow cue words; natural language processing evaluation rules can be used to evaluate the ability of evaluation objects related to large models to understand and generate natural language; retrieval enhancement and generation evaluation rules can be used to evaluate the ability of evaluation objects related to large models to retrieve and generate information; and code capability evaluation rules can be used to evaluate the ability of evaluation objects related to large models to generate and execute code.
[0062] At least one evaluation object can be understood as an object that has an evaluation requirement. For example, at least one evaluation object can be one or more of at least one large model and at least one intelligent agent.
[0063] This paper does not restrict the method of obtaining the first evaluation task in one or more scenarios. For example, the intelligent agent platform can provide an interactive interface, and the first sender can input the first evaluation set and at least one evaluation object in the interactive interface, and select at least one evaluation rule to create the first evaluation task.
[0064] Step S302: For each of the multiple dialogue groups, perform the following steps (steps S3021 to S3023).
[0065] Step S3021: Obtain the virtual end time of the first subtask based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue.
[0066] Step S3022: Add the first subtask to the evaluation queue according to the virtual end time of the first subtask.
[0067] Step S3023: In response to the first subtask being at the head of the evaluation queue, execute the first subtask based on the dialogue group and obtain the evaluation result of the first subtask.
[0068] Since the first evaluation set includes multiple dialogue groups, evaluating at least one evaluation object using the first evaluation set under at least one evaluation rule can be broken down into evaluating at least one evaluation object using multiple dialogue groups under at least one evaluation rule. Therefore, the first evaluation task can be broken down into multiple subtasks, with one subtask corresponding to one dialogue group.
[0069] For example, the first evaluation set includes dialogue group A and dialogue group B. The first evaluation task can be divided into subtask A and subtask B. Subtask A is to evaluate at least one evaluation object using dialogue group A under at least one evaluation rule. Subtask B is to evaluate at least one evaluation object using dialogue group B under at least one evaluation rule.
[0070] During the processing of each dialogue group, the subtask corresponding to that dialogue group is called the first subtask. For the first subtask, the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, and the number of at least one evaluation object all affect the execution time of the first subtask.
[0071] Different evaluation rules correspond to different levels of complexity. For example, since the prompt word evaluation rule only imposes surface constraints on at least one evaluation object, its complexity is relatively low. Natural language processing (NLP) evaluation rules impose linguistic constraints on at least one evaluation object, thus their complexity can be higher than that of prompt word evaluation rules. Similarly, retrieval enhancement generation evaluation rules impose constraints on the retrieval generation chain for at least one evaluation object, resulting in higher complexity than NLP evaluation rules. Likewise, code capability evaluation rules impose constraints on code generation and execution for at least one evaluation object, leading to higher complexity than retrieval enhancement generation evaluation rules.
[0072] Since different evaluation rules correspond to different levels of complexity, the higher the complexity, the longer the execution time. Therefore, the type of evaluation rule affects the execution time of the first subtask. The larger the number of dialogue rounds in a dialogue group, the longer the execution time. Therefore, the number of dialogue rounds in a dialogue group affects the execution time of the first subtask. The more evaluation objects there are, the longer the execution time. Therefore, the number of evaluation objects affects the execution time of the first subtask.
[0073] In one or more scenarios presented in this paper, the weighted fair queuing (WFQ) algorithm is applied to the subtask scheduling scenario of an intelligent agent platform. The evaluation queue can be used to add subtasks to be executed. For example, the evaluation queue can be used to add different subtasks of different evaluation tasks of different tenants. That is, the evaluation queue is used to uniformly schedule the evaluation tasks of multiple tenants.
[0074] The virtual time of the evaluation queue can be understood as the system virtual time. The virtual time of the evaluation queue is related to the execution progress of the subtasks in the evaluation queue. The virtual finish time (VFT) of the first subtask can be understood as the estimated time for the first subtask to be completed under the virtual time of the evaluation queue.
[0075] For example, obtaining the virtual end time of the first subtask based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue includes: obtaining the execution time of the first subtask based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, and the number of at least one evaluation object; and obtaining the virtual end time of the first subtask based on the virtual time of the evaluation queue and the execution time of the first subtask.
[0076] For example, the virtual time of the evaluation queue is 20, the execution time of the first subtask is 5, and the virtual end time of the first subtask is 25.
[0077] The subtasks in the evaluation queue are sorted according to their virtual end time. Furthermore, the virtual end time of each subtask in the evaluation queue gradually increases. That is, the smaller the virtual end time of a subtask, the earlier it is in the evaluation queue, and the larger the virtual end time of a subtask, the later it is in the evaluation queue.
[0078] The intelligent agent platform can process each subtask in the evaluation queue sequentially. For example, during each subtask execution, the intelligent agent platform can retrieve the first subtask in the evaluation queue and execute it.
[0079] For the first subtask, once all other subtasks preceding it in the evaluation queue have been completed, the first subtask will be at the head of the evaluation queue. The agent platform will then execute the first subtask, meaning that the agent platform will use the dialogue group to evaluate at least one evaluation object under at least one evaluation rule to obtain the evaluation result of the first subtask.
[0080] In this way, different subtasks of different tenants are uniformly scheduled through the evaluation queue. When scheduling subtasks, the complexity of evaluation rules, the number of dialogue rounds, and the number of evaluation objects that affect the execution time are fully considered. This improves the scheduling fairness between different tenants' task scheduling and between different subtasks in a single evaluation task. While ensuring the throughput of the intelligent agent platform, task execution latency is reduced and overall fairness is improved.
[0081] The process of obtaining the virtual end time of the first subtask is explained below.
[0082] In some cases, in response to an empty evaluation queue, the virtual time of the evaluation queue is updated to the set initial time, or in response to the completion of the second subtask in the evaluation queue, the virtual time of the evaluation queue is updated to the virtual end time of the second subtask.
[0083] In other words, when there is no previous subtask execution process in the evaluation queue, the virtual time of the evaluation queue is the set initial time. When there is a previous subtask execution process in the evaluation queue, the virtual end time of the subtask that was completed in the previous subtask execution process (i.e. the second subtask) is used as the virtual time of the evaluation queue.
[0084] For example, in response to an empty evaluation queue, there is no previous subtask execution process in the evaluation queue, and the virtual time of the evaluation queue is 0. Evaluation task A includes subtasks A1, A2, and A3. The execution time of subtask A1 is 1, the execution time of subtask A2 is 1, and the execution time of subtask A3 is 2. Then the virtual end time of subtask A1 is 1, the virtual end time of subtask A2 is 2, and the virtual end time of subtask A3 is 3. Therefore, subtask A1 is in the first position in the evaluation queue, subtask A2 is in the second position, and subtask A3 is in the third position. During the first subtask execution, subtask A1 is executed. After subtask A1 is completed, the virtual time in the evaluation queue is updated to 1. At this time, evaluation task B is created. Evaluation task B includes subtask B1 and subtask B2. The execution time of subtask B1 is 0.5, and the execution time of subtask B2 is 1.5. Therefore, the virtual end time of subtask B1 is 1.5, and the virtual end time of subtask B2 is 2.5. Thus, subtask B1 is in the first position in the evaluation queue, subtask A2 is in the second position, subtask B3 is in the third position, and subtask A4 is in the fourth position.
[0085] In this way, after each subtask is completed, the virtual time of the virtual evaluation queue is updated in real time to ensure that the virtual time of the evaluation queue matches the current subtask execution progress. This allows for unified management of different subtasks in different evaluation tasks of different tenants according to their execution time, using the virtual time of the evaluation queue.
[0086] In some cases, the virtual end time of the first subtask is obtained based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue. This includes: obtaining the complexity coefficient of the first subtask based on the complexity of at least one evaluation rule; obtaining the size coefficient of the first subtask based on the number of dialogue rounds in the dialogue group and the number of at least one evaluation object; obtaining the virtual time of the evaluation queue; and obtaining the virtual end time of the first subtask based on the complexity coefficient, the size coefficient, and the virtual time of the evaluation queue.
[0087] In other words, the complexity coefficient of the first subtask is obtained based on the complexity of at least one evaluation rule, the scale coefficient of the first subtask is obtained based on the number of dialogue rounds in the dialogue group and the number of at least one evaluation object, and the virtual end time of the first subtask is calculated by combining the complexity coefficient and the scale coefficient of the first subtask.
[0088] For example, the greater the complexity of at least one evaluation rule, the smaller the complexity coefficient of the first subtask; the greater the number of dialogue rounds in the dialogue group and the number of at least one evaluation object, the greater the size coefficient of the first subtask.
[0089] For example, the complexity coefficient of the first subtask can be the reciprocal of the product of the complexity of at least one evaluation rule, and the size coefficient of the first subtask can be the product of the number of dialogue rounds in the dialogue group and the number of at least one evaluation object.
[0090] For example, the larger the size coefficient of the first subtask, the longer the execution time of the first subtask, and the larger the virtual end time of the first subtask; the larger the complexity coefficient of the first subtask, the shorter the execution time of the first subtask, and the smaller the virtual end time of the first subtask.
[0091] For example, the virtual end time of the first subtask can be the sum of the ratio of the size coefficient of the first subtask to the complexity coefficient of the first subtask and the virtual time of the evaluation queue.
[0092] In this way, by obtaining the complexity coefficient and scale coefficient of the first subtask, the virtual end time of the first subtask can be obtained with a basis, so that the subtasks with high complexity coefficients (i.e., the complexity of the evaluation rules is small and the execution time is short) in the evaluation queue have lower latency, ensuring that simple and small subtasks can be processed first, and improving the task benefits of small subtasks with small scale coefficients (i.e., small number of dialogue rounds and small number of evaluation objects) and large complexity coefficients (i.e., small complexity of evaluation rules).
[0093] Figure 4 The diagram illustrates a flowchart of an evaluation task processing method based on an intelligent agent platform, provided in at least one of the scenarios described herein.
[0094] like Figure 4 As shown, in some cases, after obtaining the first evaluation task, a snapshot of the first evaluation task can be taken. For example, a snapshot of the first evaluation set, at least one evaluation rule, and at least one evaluation object of the first evaluation task can be taken to obtain the first snapshot information.
[0095] A snapshot can be understood as a data protection technology based on a pointer mapping mechanism, which enables rapid backup and recovery by recording the data state at a specific point in time.
[0096] This paper does not restrict the method of taking a snapshot of the first evaluation task in one or more scenarios. For example, the agent platform can use full serialization to save the first evaluation set, at least one evaluation rule, and at least one evaluation object of the first evaluation task into serialized information for storage; or the agent platform can use hash storage to convert the first evaluation set, at least one evaluation rule, and at least one evaluation object of the first evaluation task into hash storage.
[0097] Thus, before executing any subtask in the first evaluation task, a snapshot is taken of the first evaluation set, at least one evaluation rule, and at least one evaluation object to generate an immutable copy of the evaluation environment (i.e., the first snapshot information). This prevents the first evaluation set, at least one evaluation rule, and at least one evaluation object from being edited or modified during the subsequent execution of the first evaluation task. After the first evaluation task is completed, the first snapshot information and the evaluation result of the first evaluation task can be strongly bound together, enabling the evaluation result of the first evaluation task to be reproducible, traceable, and auditable, thereby improving the reliability of the intelligent agent platform.
[0098] Continue as Figure 4 As shown, in some cases, after taking a snapshot of the first evaluation task, the first subtask in the first evaluation task can also be detected. For example, the first evaluation task can be sent by a first sender, the first sender belongs to a first tenant, the first tenant corresponds to multiple workspaces, and the first subtask uses the first workspace among the multiple workspaces. Based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue, the virtual end time of the first subtask is obtained, including: obtaining the resource usage of the first subtask, detecting the first subtask based on the number of at least one available resource, obtaining the detection result, and in response to the detection result indicating that the resource usage of the first subtask is less than or equal to the number of at least one available resource, the virtual end time of the first subtask is obtained based on the complexity of at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue.
[0099] In a multi-tenant architecture intelligent agent platform, there are multiple workspaces. A workspace can be understood as an independent, isolated, and manageable container for resource and data logic. A tenant can own one or more workspaces. Different users under a tenant can use the workspace owned by the tenant to execute evaluation tasks. That is, a workspace can be the smallest resource unit corresponding to an evaluation task.
[0100] Detecting the first subtask can be understood as determining the quota for the first subtask in at least one dimension, that is, determining whether there are enough resources to execute the first subtask.
[0101] The resource usage of the first subtask can be understood as the amount of resources required to execute the first subtask. At least one available resource quantity can be understood as the available resource quantity in at least one dimension, such as the amount of computing resources, storage resources, etc. available in at least one dimension. For example, at least one available resource quantity may include at least one of the available resource quantity of the intelligent agent platform, the available resource quantity of the first tenant, the available resource quantity of the first sender, and the available resource quantity of the first workspace.
[0102] In other words, resource quantity is detected according to four levels: intelligent agent platform dimension, tenant dimension, user dimension, and workspace dimension. If the resource usage of the first subtask is less than or equal to at least one available resource quantity, the virtual end time of the first subtask is obtained, and the first subtask is added to the evaluation queue for execution. The evaluation result of the first evaluation task is obtained after all subtasks of the first evaluation task have been executed.
[0103] Thus, before executing the first subtask, dynamic quota detection in four dimensions ensures that the first subtask is executed with sufficient resources, reducing the occurrence of execution interruptions or exceptions due to insufficient resources during the execution of the first subtask.
[0104] Figure 5 The schematic diagram illustrates a flowchart of a subtask retry mechanism provided in at least one scenario of this paper.
[0105] like Figure 5 As shown, the first subtask is tested based on at least one available resource quantity. After obtaining the test results, if the resource usage quantity of the first subtask is less than or equal to at least one available resource quantity, the virtual end time of the first subtask is obtained and the first subtask is added to the evaluation queue. If the resource usage quantity of the first subtask is not less than or equal to at least one available resource quantity, a subtask retry mechanism is provided.
[0106] For example, in response to a detection result indicating that the resource usage of the first subtask is greater than any of the available resources in at least one available resource quantity, the first subtask is added to a retry queue for re-detection after a set time.
[0107] The retry queue can be used to store at least one subtask whose available resources do not meet the required resource quantity. That is, if the current available resources are insufficient to execute the first subtask smoothly, the first subtask is added to the retry queue. After a set time, the available resources may change (for example, some resources are released, increasing the available resources). The first subtask is then re-tested to determine whether the resource usage of the first subtask is less than or equal to the available resources, and whether there are enough resources to execute the first subtask.
[0108] The set time can be obtained in different ways. For example, a fixed set time can be pre-configured, and the first subtask can be re-checked after a fixed set time after it is added to the retry queue. Alternatively, the re-checking of the first subtask and the checking of other subtasks can be interspersed by combining scheduling algorithms and resource usage.
[0109] In some possible implementations, the retry queue can be provided in the form of a message queue. For example, the retry queue can correspond to a retry topic in the message queue. In response to the detection result indicating that the resource usage of the first subtask is greater than any of the available resources in at least one available resource quantity, a message (e.g., including the first subtask identifier) is produced in the retry topic. When the first subtask is re-detected, the message is consumed from the retry topic.
[0110] In this way, a closed loop of "insufficient available resources - re-entering the queue - trying again later" is formed. If a subtask (such as the first subtask) cannot be executed successfully, the backpressure loop described above will schedule the subtask that cannot be executed successfully to the retry queue, thereby reducing the blocking and resource consumption caused by the subtask that cannot be executed successfully and improving the execution efficiency of the evaluation task of the intelligent agent platform.
[0111] During the execution of the first subtask, a token bucket can be used to limit the execution of the subtask. The token bucket can be understood as a flow control and rate limiting method. Tokens are generated in the token bucket at a certain rate. Before a request (such as a subtask execution request) is executed, a token needs to be obtained from the token bucket to execute the request. If a token cannot be obtained from the token bucket, the request will be discarded.
[0112] In one or more scenarios described herein, different token buckets can be used to constrain different evaluation rules. For example, at least one evaluation rule may include a first evaluation rule, which, based on a dialogue group, executes a first subtask and obtains an evaluation result for the first subtask. This includes: using the token bucket corresponding to the first evaluation rule, based on a dialogue group, executing the first subtask under the first evaluation rule, and obtaining an evaluation result for the first subtask.
[0113] In other words, different token buckets are configured for different evaluation rules, and the token generation rate of different token buckets corresponding to different evaluation rules can be different. For a subtask (e.g., the first subtask), the part of the subtask related to the evaluation rule is executed using the token bucket corresponding to the evaluation rule.
[0114] In this way, by configuring different token buckets for different evaluation rules, the execution rate of the evaluation process of different evaluation rules is limited, which conforms to the calling characteristics of evaluation rules. Furthermore, the evaluation processes of different evaluation rules do not affect each other, and the execution rate can be adaptively adjusted on a rule-by-rule basis, thereby improving the flexibility of the intelligent agent platform in task evaluation scenarios.
[0115] Furthermore, in response to alarm information generated during the execution of the first subtask, the token generation rate of the token bucket corresponding to the first evaluation rule can be reduced.
[0116] An alarm message can be understood as an anomaly occurring during the execution of the first subtask. For example, an anomaly may occur during the execution of the first subtask under the first evaluation rule based on the dialogue group. In this case, it is necessary to limit the execution rate of the first evaluation rule to reduce the possibility of congestion and increased resource consumption in subsequent subtasks.
[0117] Since the first evaluation rule is configured with a corresponding token bucket, the execution rate of the first evaluation rule can be quickly limited by reducing the token generation rate of the token bucket corresponding to the first evaluation rule. For example, by configuring the token generation rate of the token bucket corresponding to the first evaluation rule to 0, the execution of the subtask corresponding to the first evaluation rule can be circuit-broken. When there is an anomaly in the execution of the first evaluation rule, it can be stopped in time, and after recovery, the token generation rate of the token bucket corresponding to the first evaluation rule can be reconfigured to continue the execution of the subtask corresponding to the first evaluation rule.
[0118] In this way, by promptly reducing the token generation rate of the token bucket corresponding to the first evaluation rule, circuit breaking can be implemented in a timely manner in the event of an anomaly during the execution of the subtask, forming an end-to-end flow control closed loop and reducing the resource consumption of the intelligent agent platform under the same concurrency.
[0119] Based on the agent-based evaluation task processing method provided in at least one aspect of this paper, an agent-based evaluation task processing apparatus is also provided in at least one aspect of this paper. The following will combine... Figure 6 A detailed description is provided of the evaluation task processing device based on the intelligent agent platform.
[0120] Figure 6 The schematic diagram illustrates the structure of an evaluation task processing device based on an intelligent agent platform, as provided in at least one of the embodiments described herein.
[0121] like Figure 6 As shown, in this scenario, the evaluation task processing device 600 based on an intelligent agent platform is used to evaluate evaluation objects related to large models. The evaluation task processing device 600 based on the intelligent agent platform includes an acquisition module 601 and an execution module 602. For example, the acquisition module 601 and the execution module 602 can be implemented by hardware (e.g., circuit) modules or software modules, etc. The following cases are the same and will not be described again. For example, the acquisition module 601 and the execution module 602 can be implemented by a central processing unit (CPU), a general-purpose graphics processing unit (GPGPU), a graphics processing unit (GPU), a tensor processor (TPU), a field-programmable gate array (FPGA), or other forms of processing units with data processing capabilities and / or instruction execution capabilities, as well as corresponding computer instructions.
[0122] The acquisition module 601 is configured to acquire a first evaluation task, wherein the first evaluation task indicates that at least one evaluation object is evaluated using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, and each of the multiple dialogue groups including at least one round of dialogue. For example, the acquisition module 601 can be configured to execute step S301 described above; its specific implementation principle can be found in the relevant description of step S301, and will not be repeated here.
[0123] The execution module 602 is configured to perform the following steps for each of the plurality of dialogue groups: obtaining the virtual end time of a first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is the subtask corresponding to the dialogue group; adding the first subtask to the evaluation queue based on its virtual end time, wherein the virtual end times of each subtask in the evaluation queue gradually increase; and executing the first subtask based on the dialogue group in response to the first subtask being at the head of the evaluation queue, thereby obtaining the evaluation result of the first subtask. For example, the execution module 602 can be configured to execute step S302 described above; its specific implementation principle can be found in the relevant description of step S302, and will not be repeated here.
[0124] In at least one embodiment of this document, the execution module 602 is further configured to: obtain the complexity coefficient of the first subtask based on the complexity of the at least one evaluation rule; obtain the size coefficient of the first subtask based on the number of dialogue rounds in the dialogue group and the number of the at least one evaluation object; and obtain the virtual time of the evaluation queue; and obtain the virtual end time of the first subtask based on the complexity coefficient of the first subtask, the size coefficient of the first subtask, and the virtual time of the evaluation queue.
[0125] In at least one of the embodiments described herein, the execution module 602 is further configured to: update the virtual time of the evaluation queue to a set initial time in response to the evaluation queue being empty; or update the virtual time of the evaluation queue to the virtual end time of the second subtask in response to the completion of the second subtask in the evaluation queue.
[0126] In at least one of the embodiments described herein, the evaluation task processing device 600 based on the intelligent agent platform further includes a snapshot module, which is configured to take a snapshot of the first evaluation set, the at least one evaluation rule, and the at least one evaluation object of the first evaluation task to obtain first snapshot information.
[0127] In at least one scenario described herein, the first evaluation task is sent by a first sender, the first sender belongs to a first tenant, the first tenant corresponds to multiple workspaces, the first subtask uses the first workspace among the multiple workspaces, and the execution module 602 is further configured to: obtain the resource usage quantity of the first subtask; detect the first subtask based on at least one available resource quantity to obtain a detection result, wherein the at least one available resource quantity includes at least one of the available resource quantity of the intelligent agent platform, the available resource quantity of the first tenant, the available resource quantity of the first sender, and the available resource quantity of the first workspace; in response to the detection result indicating that the resource usage quantity of the first subtask is less than or equal to the at least one available resource quantity, obtain the virtual end time of the first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds of the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue.
[0128] In at least one of the embodiments herein, the execution module 602 is further configured to: add the first subtask to a retry queue in response to the detection result indicating that the resource usage of the first subtask is greater than any of the at least one available resource quantity, so as to re-detect the first subtask after a set time.
[0129] In at least one of the embodiments described herein, the at least one evaluation rule includes a first evaluation rule, and the execution module 602 is further configured to: utilize the token bucket corresponding to the first evaluation rule, based on the dialogue group, execute the first subtask under the first evaluation rule, and obtain the evaluation result of the first subtask.
[0130] In at least one of the embodiments described herein, the execution module 602 is further configured to: reduce the token generation rate of the token bucket corresponding to the first evaluation rule in response to an alarm message generated during the execution of the first subtask.
[0131] It should be noted that, for clarity and brevity, not all components of the evaluation task processing device 600 based on the intelligent agent platform are shown in at least one of the embodiments herein. To achieve the necessary functions of the evaluation task processing device 600 based on the intelligent agent platform, those skilled in the art may provide or configure other components not shown, according to specific needs, and the embodiments herein do not impose any limitations on this.
[0132] The evaluation task processing device 600 based on the intelligent agent platform provided in at least one aspect of this article and the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this article are based on the same inventive concept and can achieve the same technical effect and the same technical purpose as the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this article. For details, please refer to the relevant description above, which will not be repeated here.
[0133] This document also provides, in at least one embodiment, an electronic device including a processing device and a storage device, the storage device including one or more computer program modules; wherein the one or more computer program modules are stored in the storage device and configured to be executed by the processing device, the one or more computer program modules being used to implement the evaluation task processing method based on an intelligent agent platform provided in any embodiment of this document.
[0134] For example, the processing device may be a processor, such as a central processing unit (CPU), digital signal processor (DSP), image processor (GPU), general-purpose graphics processor (GPGPU), or other form of processing unit with data processing capabilities and / or instruction execution capabilities. It may be a general-purpose processor or a dedicated processor and may control other components in the electronic device to perform the desired functions.
[0135] For example, the storage device may be a memory, which may include one or more computer program products. These computer program products may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may, for example, include random access memory (RAM) and / or cache memory. The non-volatile memory may, for example, include read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and a processing device may execute these program instructions to implement the functions described in at least one of the embodiments herein (implemented by the processing device) and / or other desired functions. Various application programs and various data may also be stored on the computer-readable storage medium, which is not limited in the embodiments described herein.
[0136] The following is for reference. Figure 7 The diagram illustrates a structural schematic of an electronic device (e.g., a terminal device or a server) 700 suitable for implementing at least one of the embodiments described herein. The terminal device in at least one embodiment may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital radio receivers, personal digital assistants (PDAs), tablet computers (PADs), portable multimedia players (PMPs), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital televisions and desktop computers. Figure 7The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of at least one of the situations described herein.
[0137] like Figure 7 As shown, the electronic device 700 may include a processing unit (e.g., a central processing unit, a graphics processor, etc.) 701, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage device 708 into a random access memory (RAM) 703. The RAM 703 also stores various programs and data required for the operation of the electronic device 700. The processing unit 701, ROM 702, and RAM 703 are interconnected via a bus 704. An input / output (I / O) interface 705 is also connected to the bus 704.
[0138] Typically, the following devices can be connected to I / O interface 705: input devices 706 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 707 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 708 including, for example, magnetic tapes, hard disks, etc.; and communication devices 709. Communication device 709 allows electronic device 700 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 7 An electronic device 700 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.
[0139] In particular, according to one or more embodiments herein, the processes described in the above-referenced flowcharts can be implemented as computer software programs. For example, one or more embodiments herein include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such a scenario, the computer program can be downloaded and installed from a network via communication device 709, or installed from storage device 708, or installed from ROM 702. When the computer program is executed by processing device 701, it performs the functions defined in the methods of at least one embodiment herein.
[0140] The electronic device 700 provided in at least one aspect of this article and the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this article are based on the same inventive concept and can achieve the same technical effect and the same technical purpose as the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this article. For details, please refer to the relevant description above, which will not be repeated here.
[0141] It should be noted that the computer-readable medium described above can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this document, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium may be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, radio frequency (RF), etc., or any suitable combination thereof.
[0142] The computer-readable storage medium provided in at least one aspect of this document and the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this document are based on the same inventive concept and can achieve the same technical effect and the same technical purpose as the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this document. For details, please refer to the relevant descriptions above, which will not be repeated here.
[0143] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol, such as the Hypertext Transfer Protocol (HTTP), and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.
[0144] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.
[0145] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to perform the aforementioned evaluation task processing method based on the intelligent agent platform.
[0146] Computer program code for performing the operations described herein may be written in one or more programming languages or a combination thereof, including but not limited to object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0147] One or more embodiments of this document also provide a computer program product comprising one or more computer instructions. When these computer instructions are loaded and executed on a computing device, all or part of the processes or functions described in any of these embodiments are generated.
[0148] The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, or data center to another website, computer, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
[0149] When the computer program product is executed by a computer, the computer executes any of the aforementioned methods of the evaluation task processing method based on the intelligent agent platform. The computer program product can be a software installation package; when any of the aforementioned methods of the evaluation task processing method based on the intelligent agent platform is required, the computer program product can be downloaded and executed on the computer.
[0150] The computer program product provided in at least one aspect of this paper and the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this paper are based on the same inventive concept and can achieve the same technical effect and the same technical purpose as the evaluation task processing method based on the intelligent agent platform provided in at least one aspect of this paper. For details, please refer to the relevant descriptions above, which will not be repeated here.
[0151] The flowcharts and block diagrams in the accompanying figures illustrate the architecture, functionality, and operation of possible implementations of the systems, methods, and computer program products according to the various scenarios described herein. In this respect, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the figures. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0152] The units or modules described in at least one of the scenarios herein can be implemented in software or hardware. The names of the units or modules do not, in some cases, constitute a limitation on the unit or module itself.
[0153] The functions described above in this document can be performed at least in part by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.
[0154] In the context of this document, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0155] Based on one or more scenarios described herein, Example 1 provides a method for processing evaluation tasks based on an intelligent agent platform, wherein the intelligent agent platform is used to evaluate evaluation objects related to large models, and the method includes:
[0156] Obtain a first evaluation task, wherein the first evaluation task instructs: to evaluate at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue;
[0157] For each of the plurality of dialogue groups, the following steps are performed: Based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, the virtual end time of the first subtask is obtained, wherein the first subtask is the subtask corresponding to the dialogue group; based on the virtual end time of the first subtask, the first subtask is added to the evaluation queue, wherein the virtual end times of each subtask in the evaluation queue gradually increase; in response to the first subtask being at the head of the evaluation queue, based on the dialogue group, the first subtask is executed to obtain the evaluation result of the first subtask.
[0158] Based on one or more scenarios described herein, Example 2 provides the method in Example 1 for obtaining the virtual end time of the first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, including:
[0159] Based on the complexity of the at least one evaluation rule, obtain the complexity coefficient of the first subtask; based on the number of dialogue rounds of the dialogue group and the number of the at least one evaluation object, obtain the scale coefficient of the first subtask; and obtain the virtual time of the evaluation queue.
[0160] The virtual end time of the first subtask is obtained based on the complexity coefficient of the first subtask, the size coefficient of the first subtask, and the virtual time of the evaluation queue.
[0161] Depending on one or more scenarios described in this article, Example 3 provides the method from Example 1, and also includes:
[0162] In response to the evaluation queue being empty, the virtual time of the evaluation queue is updated to the set initial time; or
[0163] In response to the completion of the second subtask in the evaluation queue, the virtual time of the evaluation queue is updated to the virtual end time of the second subtask.
[0164] According to one or more scenarios in this paper, Example 4 provides that, after obtaining the first evaluation task, the method in Example 1 further includes:
[0165] A snapshot is taken of the first evaluation set, the at least one evaluation rule, and the at least one evaluation object of the first evaluation task to obtain first snapshot information.
[0166] Based on one or more scenarios described herein, Example 5 provides an example where the first evaluation task in Example 1 is sent by a first sender, the first sender belongs to a first tenant, the first tenant corresponds to multiple workspaces, and the first subtask uses the first workspace among the multiple workspaces.
[0167] The step of obtaining the virtual end time of the first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue includes:
[0168] Get the resource usage of the first subtask;
[0169] Based on at least one available resource quantity, the first subtask is detected to obtain a detection result, wherein the at least one available resource quantity includes at least one of the available resource quantity of the intelligent agent platform, the available resource quantity of the first tenant, the available resource quantity of the first sender, and the available resource quantity of the first workspace.
[0170] In response to the detection results indicating that the resource usage of the first subtask is less than or equal to the number of at least one available resource, the virtual end time of the first subtask is obtained based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue.
[0171] Depending on one or more scenarios described in this article, Example 6 provides the method from Example 5, and also includes:
[0172] In response to the detection result indicating that the resource usage of the first subtask is greater than any of the at least one available resource quantity, the first subtask is added to the retry queue for re-detection after a set time.
[0173] According to one or more scenarios in this document, Example 7 provides at least one evaluation rule from any of Examples 1 to 6, including a first evaluation rule, wherein the first subtask is executed based on the dialogue group to obtain an evaluation result for the first subtask, including:
[0174] Using the token bucket corresponding to the first evaluation rule, the first subtask is executed under the first evaluation rule based on the dialogue group, and the evaluation result of the first subtask is obtained.
[0175] Depending on one or more scenarios described in this article, Example 8 provides the method from Example 7, and also includes:
[0176] In response to an alarm message generated during the execution of the first subtask, the token generation rate of the token bucket corresponding to the first evaluation rule is reduced.
[0177] According to one or more scenarios described herein, Example 9 provides an evaluation task processing apparatus based on an intelligent agent platform, the intelligent agent platform being used to evaluate evaluation objects related to a large model, the apparatus comprising:
[0178] The acquisition module is configured to: acquire a first evaluation task, wherein the first evaluation task indicates: to evaluate at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue;
[0179] The execution module is configured to perform the following steps for each of the plurality of dialogue groups: obtaining the virtual end time of a first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is a subtask corresponding to the dialogue group; adding the first subtask to the evaluation queue based on its virtual end time, wherein the virtual end times of each subtask in the evaluation queue gradually increase; and, in response to the first subtask being at the head of the evaluation queue, executing the first subtask based on the dialogue group to obtain the evaluation result of the first subtask.
[0180] According to one or more of the provisions of this document, Example 10 provides an electronic device comprising:
[0181] At least one processor; and
[0182] At least one memory, including one or more computer program instructions;
[0183] The evaluation task processing method based on an intelligent agent platform provided herein is executed by the processor when one or more computer program instructions are executed at least once.
[0184] The above description is merely a preferred embodiment and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of disclosure herein is not limited to technical solutions formed by specific combinations of the above-described technical features, but also includes other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-disclosed concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features disclosed herein that have similar functions.
[0185] Furthermore, while the operations are described in a specific order, this should not be construed as requiring these operations to be performed in the specific order shown or in a sequential order. In certain contexts, multitasking and parallel processing may be advantageous. Similarly, while some specific implementation details are included in the above discussion, these should not be interpreted as limiting the scope of this paper. Certain features described in the context of a single case can also be implemented in combination within that single case. Conversely, various features described in the context of a single case can also be implemented individually or in any suitable sub-combination in multiple cases.
[0186] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.
Claims
1. A method for processing evaluation tasks based on an intelligent agent platform, wherein, The intelligent agent platform is used to evaluate evaluation objects related to large models, and the method includes: Obtain a first evaluation task, wherein the first evaluation task instructs: to evaluate at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue; For each of the plurality of dialogue groups, perform the following steps: The virtual end time of the first subtask is obtained based on the complexity of the at least one evaluation rule, the number of dialogue rounds of the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is the subtask corresponding to the dialogue group. Based on the virtual end time of the first subtask, the first subtask is added to the evaluation queue, wherein the virtual end time of each subtask in the evaluation queue gradually increases; In response to the first subtask being at the head of the evaluation queue, the first subtask is executed based on the dialogue group to obtain the evaluation result of the first subtask.
2. The method according to claim 1, wherein, The step of obtaining the virtual end time of the first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue includes: Based on the complexity of the at least one evaluation rule, obtain the complexity coefficient of the first subtask; based on the number of dialogue rounds of the dialogue group and the number of the at least one evaluation object, obtain the scale coefficient of the first subtask; and obtain the virtual time of the evaluation queue. The virtual end time of the first subtask is obtained based on the complexity coefficient of the first subtask, the size coefficient of the first subtask, and the virtual time of the evaluation queue.
3. The method according to claim 1, further comprising: In response to the evaluation queue being empty, the virtual time of the evaluation queue is updated to the set initial time; or In response to the completion of the second subtask in the evaluation queue, the virtual time of the evaluation queue is updated to the virtual end time of the second subtask.
4. The method according to claim 1, wherein, After obtaining the first evaluation task, the method further includes: A snapshot is taken of the first evaluation set, the at least one evaluation rule, and the at least one evaluation object of the first evaluation task to obtain first snapshot information.
5. The method according to claim 1, wherein, The first evaluation task is sent by a first sender, which belongs to a first tenant. The first tenant corresponds to multiple workspaces, and the first subtask uses the first workspace among the multiple workspaces. The step of obtaining the virtual end time of the first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue includes: Get the resource usage of the first subtask; Based on at least one available resource quantity, the first subtask is detected to obtain a detection result, wherein the at least one available resource quantity includes at least one of the available resource quantity of the intelligent agent platform, the available resource quantity of the first tenant, the available resource quantity of the first sender, and the available resource quantity of the first workspace. In response to the detection results indicating that the resource usage of the first subtask is less than or equal to the number of at least one available resource, the virtual end time of the first subtask is obtained based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of at least one evaluation object, and the virtual time of the evaluation queue.
6. The method according to claim 5, further comprising: In response to the detection result indicating that the resource usage of the first subtask is greater than any of the at least one available resource quantity, the first subtask is added to the retry queue for re-detection after a set time.
7. The method according to any one of claims 1 to 6, wherein, The at least one evaluation rule includes a first evaluation rule. The step of executing the first subtask based on the dialogue group and obtaining the evaluation result of the first subtask includes: Using the token bucket corresponding to the first evaluation rule, the first subtask is executed under the first evaluation rule based on the dialogue group, and the evaluation result of the first subtask is obtained.
8. The method according to claim 7, further comprising: In response to an alarm message generated during the execution of the first subtask, the token generation rate of the token bucket corresponding to the first evaluation rule is reduced.
9. An evaluation task processing device based on an intelligent agent platform, wherein, The intelligent agent platform is used to evaluate evaluation objects related to large models, and the device includes: The acquisition module is configured to: acquire a first evaluation task, wherein the first evaluation task indicates: to evaluate at least one evaluation object using a first evaluation set under at least one evaluation rule, the first evaluation set including multiple dialogue groups, each of the multiple dialogue groups including at least one round of dialogue; The execution module is configured to perform the following steps for each of the plurality of dialogue groups: obtaining the virtual end time of a first subtask based on the complexity of the at least one evaluation rule, the number of dialogue rounds in the dialogue group, the number of the at least one evaluation object, and the virtual time of the evaluation queue, wherein the first subtask is a subtask corresponding to the dialogue group; adding the first subtask to the evaluation queue based on its virtual end time, wherein the virtual end times of each subtask in the evaluation queue gradually increase; and, in response to the first subtask being at the head of the evaluation queue, executing the first subtask based on the dialogue group to obtain the evaluation result of the first subtask.
10. An electronic device, comprising: At least one processor; as well as At least one memory, including one or more computer program instructions; The one or more computer program instructions are executed by the processor to perform the method according to any one of claims 1 to 8.