Agent training evaluation method and device, electronic equipment and storage medium
By initializing the virtual simulation environment state and task description information defined by the front-end code, a virtual interactive page is generated to train or evaluate the agent. This solves the problem of limited environment state in the existing technology, realizes efficient training and evaluation of the agent in diverse scenarios, and improves the accuracy of task execution.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF AUTOMATION CHINESE ACAD OF SCI
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies rely on physical machines or heavy-duty virtualization simulators to build training and evaluation environments. This makes it difficult to flexibly and conveniently initialize the virtual operating system and virtual application states, and to build diverse task training scenarios, resulting in low task execution accuracy of intelligent agents in different scenarios.
By initializing the state and task description information of the virtual simulation environment defined by the front-end code, a virtual interactive page is generated, and these inputs are used to train or evaluate the agent, thereby enabling flexible configuration of the environment state and the construction of diverse scenarios.
Freed from the constraints of hardware devices and physical applications, it can efficiently build diverse training and evaluation scenarios, improving the accuracy of intelligent agents in performing tasks in different scenarios.
Smart Images

Figure CN122240243A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, electronic device, and storage medium for training and evaluating intelligent agents. Background Technology
[0002] With the development of artificial intelligence technology, intelligent agents capable of autonomously performing mobile device operation tasks have become a research and application hotspot. To ensure the accuracy and reliability of intelligent agents in performing various operation tasks, an efficient and controllable training and evaluation method is needed. By constructing a simulation environment to provide intelligent agents with operation interaction scenarios, a systematic training and accurate evaluation of the intelligent agent's action output and task execution effect can be achieved.
[0003] Currently, the training and evaluation of mobile device operation agents mostly rely on physical devices or heavy-duty virtualization simulators. Specifically, actual applications are installed on physical devices or simulators to recreate the real device operation environment. The screen display of the device or simulator is used as the input information for the agent. The agent outputs the corresponding device operation action based on the screen. The device or simulator executes the action and provides feedback on the new screen display. Finally, the content in the screen is analyzed through visual recognition to determine the task execution result of the agent, thus completing the training and evaluation of the agent.
[0004] However, this approach has many drawbacks in practical applications. For example, the training and evaluation environment is built by relying on physical machines or heavy virtualization simulators. The environment is limited by hardware devices and actual applications, making it difficult to flexibly and conveniently initialize the state of virtual operating systems and virtual applications. It is also difficult to build diverse task training scenarios, resulting in the lack of sufficient training for intelligent agents in multiple scenarios, leading to low accuracy of task execution in different scenarios. Summary of the Invention
[0005] This invention provides a training and evaluation method, apparatus, electronic device, and storage medium for intelligent agents. It addresses the shortcomings of existing training and evaluation environments built on physical machines or heavy virtualization simulators, which are limited by hardware devices and actual applications, making it impossible to build diverse task training scenarios. This invention frees the environment from the constraints of hardware devices and physical applications, allowing for flexible configuration of the environment state. This enables the construction of diverse training and evaluation scenarios, thereby achieving training and evaluation of intelligent agents in multiple scenarios and improving the accuracy of task execution by intelligent agents in different scenarios.
[0006] This invention provides a method for training and evaluating an intelligent agent, comprising: Initialize the state and task description information of the virtual simulation environment defined by the front-end code, the virtual simulation environment including a virtual operating system and virtual applications; A virtual interactive page is generated based on the state of the virtual operating system and the state of the virtual application. The intelligent agent is trained or evaluated using the virtual interactive page and the task description information.
[0007] According to a training and evaluation method for an intelligent agent provided by the present invention, the intelligent agent is trained or evaluated using the virtual interactive page and the task description information, including: The virtual interactive page and the task description information are used as inputs to the agent to obtain the atomic actions output by the agent. Based on the atomic actions, the state of the virtual operating system and / or the state of the virtual application are updated; Determine whether the preset interaction stop condition is met; If so, the task execution results of the agent are evaluated based on the updated state of the virtual operating system and / or the state of the virtual application to obtain the evaluation results of the agent.
[0008] According to the present invention, a method for training and evaluating an intelligent agent further includes: If the interaction stopping condition is not met, perform iterative interaction operations until the interaction stopping condition is met. The iterative interaction operations include: The virtual interactive interface generated by the updated virtual operating system state and / or virtual application state is used as the input of the agent to obtain the atomic action output by the agent in this interaction; Based on the atomic actions output by the agent in this interaction, the state of the virtual operating system and / or the state of the virtual application are updated.
[0009] According to the training and evaluation method for an intelligent agent provided by the present invention, the task execution results of the intelligent agent are evaluated based on the updated state of the virtual operating system and / or the state of the virtual application to obtain the evaluation results of the intelligent agent, including: Call the preset structured state interface to obtain the state tree of the virtual operating system and / or virtual application; Based on the state tree of the virtual operating system and / or virtual application, the task execution results of the agent are evaluated to obtain the evaluation results of the agent.
[0010] According to a training and evaluation method for an intelligent agent provided by the present invention, the task execution results of the intelligent agent are evaluated based on the state tree of the virtual operating system and / or virtual application to obtain the evaluation results of the intelligent agent, including: Compare the values in at least a portion of the key-value pairs in the state tree of the virtual operating system and / or virtual application with their corresponding target values; If the comparison result is a complete match, the evaluation result of the agent is confirmed as the agent's task execution was successful; or, if the comparison result is a partial mismatch, the evaluation result of the agent is confirmed as the agent's task execution failed.
[0011] According to a training and evaluation method for an intelligent agent provided by the present invention, based on the atomic actions, the state of the virtual operating system and / or the state of the virtual application is updated, including: Query the preset navigation configuration file to determine the jump rules for all interactive components in the virtual interactive page; Identify the target interactive component corresponding to the atomic action in the virtual interactive page; Based on the action type in the atomic action and the jump rule of the target interactive component, the value of the current page route in the state tree is determined, and the value of the current page route is used to indicate the virtual interactive page after the jump.
[0012] According to the present invention, a method for training and evaluating an intelligent agent further includes: When the atomic action carries input data, the corresponding business data in the virtual interactive page after the jump in the state tree is updated using the input data.
[0013] According to the training and evaluation method for an intelligent agent provided by the present invention, the initialization of the state of a virtual simulation environment defined by front-end code includes: The state of the virtual operating system and the virtual application is initialized to a basic initial state; Based on the task requirements corresponding to the task description information, the general state of the virtual operating system and / or the virtual application is adjusted to obtain the state corresponding to the task requirements.
[0014] According to the present invention, a method for training and evaluating an intelligent agent further includes: Parse the preset navigation configuration file and generate a page route transition diagram; Path search is performed on the page routing graph to determine several reference task trajectories; Based on each of the aforementioned reference task trajectories, several reference task description information are generated, and each of the aforementioned reference task description information can be used as the task description information.
[0015] The present invention also provides a training and evaluation device for intelligent agents, comprising: An initialization unit is used to initialize the state and task description information of the virtual simulation environment defined by the front-end code, wherein the virtual simulation environment includes a virtual operating system and virtual applications; The page rendering unit is used to generate a virtual interactive page according to the state of the virtual operating system and the state of the virtual application; The training and evaluation unit is used to train or evaluate the agent using the virtual interactive page and the task description information.
[0016] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the training and evaluation method for an intelligent agent as described above.
[0017] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the training and evaluation method for an intelligent agent as described above.
[0018] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the training and evaluation method for an intelligent agent as described above.
[0019] This invention provides a training and evaluation method, apparatus, electronic device, and storage medium for intelligent agents. First, the state of the virtual operating system and the state of the virtual application, along with task description information, are defined by front-end code. Then, the task description information and virtual interactive pages are used to train or evaluate the intelligent agent. Because the entire virtual simulation environment is defined by front-end code, it can be applied to any development machine, server, or other device that supports front-end code execution. This eliminates the constraints of hardware devices and physical applications, enabling flexible and rapid configuration of the environment state. It can efficiently build diverse intelligent agent training and evaluation scenarios, facilitating intelligent agent training and evaluation in multiple scenarios and improving the accuracy of task execution in different scenarios. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0021] Figure 1 This is one of the flowcharts illustrating the training and evaluation method for intelligent agents provided by this invention.
[0022] Figure 2This is the second flowchart of the training and evaluation method for intelligent agents provided by the present invention.
[0023] Figure 3 This is a schematic diagram of the structure of the virtual simulator based on front-end code provided by the present invention.
[0024] Figure 4 This is a schematic diagram of the structure of the intelligent agent training and evaluation device provided by the present invention.
[0025] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0026] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0027] The following is combined with Figures 1 to 3 This invention describes the training and evaluation method for intelligent agents. Figure 1 This is one of the flowcharts illustrating the training and evaluation method for intelligent agents provided by this invention, such as... Figure 1 As shown, the method includes the following: Step 101: Initialize the state and task description information of the virtual simulation environment defined by the front-end code.
[0028] The front-end code can be any front-end development code that can be executed in the browser kernel, such as, but not limited to, HyperText Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. Alternatively, the front-end code can be written based on a front-end framework, possessing cross-device and cross-platform execution characteristics, without relying on a specific physical hardware or operating system's proprietary development language. Because the virtual simulation environment of this invention is defined by the front-end code, the intelligent agent training and evaluation method provided by this invention can run on any development machine, server, or other device that supports front-end code execution (e.g., any development machine, server, or other device that supports Node.js). A browser carries a virtual operating system and a collection of several virtual applications; for example, the intelligent agent training and evaluation method provided by this invention can be executed by the browser kernel.
[0029] A virtual simulation environment includes a virtual operating system and virtual applications. The state of the virtual simulation environment can include the state of the virtual operating system and the state of the virtual applications. The virtual operating system can be considered a digital virtualization of the operating system functions and states of a mobile device. For example, the state of the virtual operating system may include, but is not limited to, page routes, status bar status, navigation bar style, system settings, etc. Virtual applications are digital virtualizations of the application functions and states on a mobile device. For example, the state of a virtual application may include, but is not limited to, page routes, the state of interactive components, business data status, etc. Mobile devices may include, but are not limited to, mobile phones, computers, etc. "Several" can be at least one. That is, pages in the virtual operating system and / or virtual applications can be numbered to obtain the page routes for each page.
[0030] Task description information can be used to define the task to be performed by the agent. A task can include starting conditions, target action / state, executable trajectory or path constraints, and decision-making methods. The task description information can be in the form of natural language text. For example, the task description information could be "Enter the chat information page of a certain session and switch 'Do Not Disturb' from off to on."
[0031] The initialization of the virtual simulation environment in step 101 can be achieved by setting the virtual simulation environment to a basic initial state. For example, setting the current page route of the virtual operating system to the desktop, the system network status to online, the current page route of the virtual application to the homepage, all page input boxes to empty, and business data to default initial values. The initialization of the task description information in step 101 can also be achieved by selecting one reference task description information from several options as the task description information.
[0032] Optionally, before each training or evaluation of the agent, step 101 can be executed to restore the state of the virtual simulation environment to the defined initial state, ensuring that the evaluation results of different agents at different times are fair and comparable.
[0033] Step 102: Generate a virtual interactive page according to the state of the virtual operating system and the virtual application.
[0034] A virtual interactive page can be considered as a virtual visual page that simulates the user interface of a mobile device, generated by front-end code based on the current state of the virtual simulation environment.
[0035] Among them, the virtual interactive page is visual input information that the intelligent agent can recognize. Its visual display effect is consistent with the real interactive interface of the mobile device, including the status bar and navigation bar of the virtual operating system, and the application icons, interactive components, and business data display of the virtual application.
[0036] In some application scenarios, step 102 can be implemented as follows: the front-end code pre-defines rendering rules for the virtual interactive page, establishing a one-to-one mapping between these rules and the states of the virtual operating system and the virtual application. After initializing the virtual simulation environment, the front-end code reads all current states of the virtual operating system and the virtual application in real time. First, it parses the state of the virtual operating system, which may include, but is not limited to, page routes, the battery / signal display in the status bar of the corresponding page, the icon style of the navigation bar, and the show / hide status of system pop-ups. Then, it parses the state of the virtual application, which may include, but is not limited to, page routes and the page layout style corresponding to those routes, the position and status of interactive components, and the visualization format of business data. Finally, according to the rendering rules of mobile devices, the states of the virtual operating system and the virtual application are merged to obtain the virtual interactive page. This virtual interactive page can be output in various formats, such as visual image formats and pixel matrix formats, that the intelligent agent can recognize.
[0037] Step 103: Use the virtual interactive page and task description information to train or evaluate the intelligent agent.
[0038] An intelligent agent can be an artificial intelligence model capable of autonomously executing mobile device operation tasks. It is an intelligent decision-making model that can receive input from virtual interactive pages and task description information, and output atomic actions through visual recognition and semantic understanding. It can optimize its own mobile device operation strategy through training and is suitable for the execution of operation tasks in various mobile device applications, such as clicks, swipes, and input operations in social, shopping, and office applications.
[0039] For example, the generated virtual interactive page and task description information are transmitted to the intelligent agent as joint input. After receiving the virtual interactive page and task description information, the intelligent agent performs visual recognition on the virtual interactive page, semantic understanding on the task description information, and outputs atomic actions for the current task. The atomic actions are the basic operation instructions of the intelligent agent on the virtual interactive page, including but not limited to clicks, swipes, and input.
[0040] Optionally, during the evaluation of the agent, the evaluation result can be obtained by matching atomic actions with target atomic actions. The evaluation result can be either "task execution successful" or "task execution failed." Alternatively, the evaluation result can be determined by updating the state of the virtual simulation environment based on the atomic actions and then determining the evaluation result based on the updated state of the virtual simulation environment. Optionally, during the training of the agent, the model parameters of the agent can be adjusted using the evaluation results.
[0041] Once trained, the agent can be deployed on mobile devices. By receiving voice or text commands and screenshots of the user interface of the mobile device, it can generate atomic actions for the mobile device and then autonomously execute tasks on the mobile device based on these atomic actions.
[0042] In the above scheme, the state of the virtual operating system and the state of the virtual application, as well as the task description information, are first initialized by the front-end code. Then, the task description information and the virtual interactive page are used to train or evaluate the agent. Because the entire virtual simulation environment is defined by the front-end code, it can be applied to any development machine, server, or other device that supports the execution of the front-end code. It breaks free from the constraints of hardware devices and physical applications, and enables flexible and rapid configuration of the environment state. It can efficiently build diverse agent training and evaluation scenarios, and improve the construction efficiency and scenario adaptability of the simulation environment.
[0043] It is understood that when the embodiments of the present invention are applied to specific products or technologies, data involving user privacy such as the state of the operating system, the state of the application, and task description information need to be authorized or agreed to by the user. Furthermore, the collection, use, and processing of such data, as well as the training, deployment, and invocation of intelligent agents, must comply with the relevant laws, regulations, and standards of the relevant countries and regions.
[0044] In one possible embodiment, step 103 described above may include, for example: Figure 2 The following steps are shown: Step 201: Use the virtual interactive page and task description information as input to the agent to obtain the atomic actions output by the agent.
[0045] Atomic actions are basic operation instructions output by an intelligent agent based on virtual interactive pages and task description information for a virtual simulation environment. Atomic actions can contain core elements such as action type, operation object, and operation parameters. Action types include, but are not limited to, clicking, swiping, input, and system gestures (back, home, pull-down notification bar, etc.).
[0046] For example, after receiving the joint input, the agent extracts pixel and component features from the virtual interactive page to identify interactive components, their positions, and states. It then segments the task description information and performs intent recognition to determine the task's objective, target, and execution requirements. Finally, it fuses and analyzes the visual recognition and semantic understanding results to output atomic actions specific to the task.
[0047] Step 202: Update the state of the virtual operating system and / or the state of the virtual application based on atomic actions.
[0048] In some application scenarios, the front-end code obtains atomic actions output by the intelligent agent through a pre-defined action receiving interface. It then performs structured parsing of these atomic actions, extracting elements such as the action type and the object of operation. Based on the object of the atomic action, it determines whether it acts on an interactive component of the virtual operating system or a virtual application within the virtual interactive interface. If the atomic action targets an interactive component in the virtual operating system (such as clicking the virtual home button or pulling down the virtual notification bar), the corresponding state of the virtual operating system is updated according to the action type. For example, clicking the virtual home button updates the current page route of the virtual operating system to the virtual desktop. If the atomic action targets an interactive component in the virtual application (such as clicking the application message button), the corresponding state of the virtual application is updated according to the action type. For example, clicking the application message button updates the current page route of the virtual application to the message page. If the atomic action includes operations on both the virtual operating system and interactive components in the virtual application, the states of both are updated simultaneously.
[0049] Alternatively, in one possible embodiment, step 202 above may include the following steps: First, query the preset navigation configuration file to determine the navigation rules for all interactive components in the virtual interactive page.
[0050] A navigation configuration file can be a structured configuration file pre-written by the front-end code, used to explicitly define the navigation rules and page state configurations for all interactive components in a virtual interactive page. For example, the navigation configuration file may include at least one of the following: page route values, business data values, the position of each interactive component on each page, and navigation rules, etc. Here, "page" refers to predefined operating system pages and / or application pages.
[0051] Interactive components refer to components in a virtual interactive page that can be manipulated by an intelligent agent through atomic actions. Interactive components may include, but are not limited to, at least one of the following: buttons, input boxes, switches, icons, list items, navigation bar options, etc. Each interactive component has a unique identifier.
[0052] Based on the values of the current virtual interactive page, filter out the configuration information that matches the current virtual interactive page from the navigation configuration file, and extract the identifiers, positions, and corresponding jump rules of all interactive components on the page.
[0053] Then, determine the target interactive component in the virtual interactive page corresponding to the atomic action.
[0054] The position of the target interactive component in the virtual interactive page corresponds to the execution position of the action in the atomic action.
[0055] Next, based on the action type in the atomic action and the jump rules of the target interactive component, the value of the current page route in the state tree is determined. The value of the current page route is used to indicate the virtual interactive page after the jump.
[0056] The value of the current page route can be considered as an identifier in the state tree representing the most recently rendered page. The value of the current page route could be "Homepage," "Messages Page," "Personal Center Page," etc. Because the virtual interactive page after a redirect is rendered from the state of the virtual operating system and the virtual application, it can be indicated by the value of the current page route in the latest state tree.
[0057] Based on the action type of the atomic action (such as click or swipe) and the corresponding navigation rule of the target interactive component, the page route navigation target triggered by this operation is determined according to the action type and navigation rule, and the value of the virtual interactive page after navigation is determined. The value of the virtual interactive page after navigation is written into the key-value pair "current page route-value" in the state tree, thereby updating the value of the current page route in the state tree. It can be understood that if the navigation rule of the interactive component limits the navigation to the same page before and after, it means that the interactive component does not actually involve page navigation, so the value of the current page route will not change. For example, the interactive component can be a text input box on a chat page. Clicking the text input box can send chat content, but the actual page is the same chat page.
[0058] In one possible embodiment, the method further includes: When an atomic action carries input data, the corresponding business data in the virtual interactive page after the jump in the state tree is updated using the input data.
[0059] When the atomic action is an input-type action, the input data needs to be written into the digital data corresponding to the relevant fields of the business data in the virtual interactive page after the redirection. The input data can include, but is not limited to, text, numbers, and symbols.
[0060] In some application scenarios, after receiving an atomic action, it is determined whether it carries input data. If it does, the corresponding business data in the virtual interactive page after the jump in the state tree is updated using the input data.
[0061] In the above scheme, when atomic actions carry input data, the corresponding business data in the state tree is updated using the input data. In this process, not only is page routing jump supported, but also dynamic writing of business data is supported, so as to achieve a comprehensive update of the state of the virtual simulation environment.
[0062] Step 203: Determine whether the preset interaction stop condition is met.
[0063] The preset interaction termination conditions may include, but are not limited to: First, during the execution of the task corresponding to the task description information, the number of interactions with the agent reaches a preset number; Second, the agent's action type is a termination action, which can be initiated by the agent declaring that the task is completed and ceasing execution; Third, the virtual simulation environment enters an abnormal state, which may be due to the component corresponding to the atomic action being uninteractive, or an unreasonable business state; Fourth, the maximum number of interactions specified by the task corresponding to the task description information is reached. The number of interactions may be the number of times a virtual interaction page is sent to the agent or the number of times atomic actions are received from the agent.
[0064] If the judgment result is negative, then execute the iterative interaction operation shown in step 204 until the interaction stop condition is met; if the judgment result is positive, then execute step 205.
[0065] Step 204: Use the virtual interactive interface generated based on the updated state of the virtual operating system and / or the state of the virtual application as input to the agent to obtain the atomic action output by the agent in this interaction; update the state of the virtual operating system and / or the state of the virtual application based on the atomic action output by the agent in this interaction.
[0066] The method of generating a virtual interactive interface based on the updated state of the virtual operating system and / or the state of the virtual application can be referred to in step 102 above, and will not be repeated here.
[0067] The method for updating the state of the virtual operating system and / or the state of the virtual application based on the atomic actions output by the interactive agent can be referred to in step 202 above, and will not be repeated here.
[0068] Step 205: Based on the updated state of the virtual operating system and / or the state of the virtual application, evaluate the task execution results of the agent to obtain the evaluation results of the agent.
[0069] Among these, the task execution results of the agent can be evaluated by taking into account the updated state of the virtual operating system and / or virtual application during the last interaction with the agent.
[0070] In some application scenarios, after determining that the preset interaction stopping conditions are met, the updated state of the virtual operating system and / or the state of the virtual application are compared and analyzed with the preset target state of this task. Then, the evaluation result of the intelligent agent is determined based on the results of the comparison and analysis. The dimensions of the comparison and analysis may include, but are not limited to, at least one of the following: the value of the current page route, component states, and business data and other states related to the task objective.
[0071] In one possible embodiment, step 205 above may include the following steps: first, calling a preset structured state interface to obtain the state tree of the virtual operating system and / or virtual application; then, based on the state tree of the virtual operating system and / or virtual application, evaluating the task execution results of the agent to obtain the evaluation results of the agent.
[0072] The structured state interface can be considered a pre-written, programmatic interface for reading all state data of the virtual operating system and / or virtual applications in a virtual simulation environment at once. The state tree can be considered a structured model that constructs the states of the virtual operating system and / or virtual applications according to hierarchical relationships. Optionally, the state tree can contain all state data of both the virtual operating system and virtual applications simultaneously, or one state tree can correspond to the virtual operating system, and another state tree can correspond to the virtual applications.
[0073] In some application scenarios, the above-mentioned evaluation of the agent's task execution results based on the state tree of the virtual operating system and / or virtual application can be achieved by comparing and analyzing the data of the entire state tree or the partial states corresponding to the current task in the state tree with the corresponding target states to obtain the evaluation results of the agent.
[0074] In the above scheme, the state tree of the virtual operating system and / or virtual application is obtained by calling the structured state interface, which realizes the fast and accurate reading of the full state data of the virtual simulation environment and reduces the errors caused by visual recognition or interface parsing.
[0075] Alternatively, in one possible embodiment, the method of evaluating the task execution result of the intelligent agent based on the state tree of the virtual operating system and / or virtual application to obtain the evaluation result of the intelligent agent may include the following steps: comparing the values in at least some key-value pairs in the state tree of the virtual operating system and / or virtual application with the corresponding target values respectively; if the comparison result is a complete match, confirming that the evaluation result of the intelligent agent is that the task execution of the intelligent agent is successful; or, if the comparison result is a partial mismatch, confirming that the evaluation result of the intelligent agent is that the task execution of the intelligent agent is unsuccessful.
[0076] A key-value pair records a state and its corresponding value. At least some key-value pairs can be related to the current task. For example, if the task description is "Open a virtual social application and enter the message page", then the key-value pairs to be compared include the "current page route-value" key-value pairs. The target value can be a standard value pre-set for this task, corresponding to a specific key-value pair in the state tree.
[0077] If any key-value pair does not match the target value, it means that the atomic action output by the agent has not met the target requirements of this task, and the evaluation result of the agent is confirmed as the agent's task execution failure.
[0078] In one possible embodiment, step 101 above may include the following steps: The virtual operating system and virtual applications are initialized to a basic initial state; based on the task requirements corresponding to the task description information, the general state of the virtual operating system and / or virtual applications is adjusted to obtain the state corresponding to the task requirements.
[0079] The initial state can be a state pre-set for the virtual operating system and virtual applications. Task requirements can be the personalized requirements of the task for the virtual simulation environment state, determined by the task description information, such as the account balance requirement for a money transfer task or the chat history requirement for a messaging task.
[0080] For example, intent recognition is performed on the task description information to obtain the task requirements. Based on the task requirements, the basic initial state is adjusted. For instance, if the task requirement is "the virtual shopping application account balance is 1000 yuan", then the account balance value in the virtual application state is modified to 1000 yuan.
[0081] In one possible embodiment, the method further includes: First, the preset navigation configuration file is parsed to generate a page routing transition graph. For example, the navigation configuration file is parsed to obtain the values of all page routes, the navigation rules for each interactive component, and the navigation relationships between pages. Then, based on the values of all page routes, the navigation rules for each interactive component, and the navigation relationships between pages, the page routing transition graph is drawn. This page routing transition graph can be a directed graph, where nodes represent specific page routes and edges represent navigation relationships between pages. In other words, by parsing the navigation configuration file, the routing relationships of all pages and the navigation rules for interactive components in the virtual simulation environment can be transformed into a directed graph, providing a convenient and intuitive display of all page navigation logic in the virtual simulation environment.
[0082] Then, a path search is performed in the page routing graph to determine several reference task trajectories. The path search method can include, but is not limited to, depth-first search or breadth-first search. The reference task trajectory is a complete page jump path from the starting page route to the target page route, obtained through path search in the page routing graph. Optionally, the reference task trajectory may include all page route nodes in the path and the interactive components that triggered the page jump.
[0083] Next, based on each reference task trajectory, several reference task descriptions are generated. Each reference task description can serve as a separate task description.
[0084] One way to generate several reference task descriptions based on each reference task trajectory is to generate a structured definition of the task based on the reference task trajectory, including, for example, the task identifier, reference trajectory, and target action. Then, task description information can be generated based on the structured definition. For example, the structured definition of the task can adopt a declarative task framework, reducing the task definition to "declaring intent + filling in fields." In some application scenarios, complex tasks can also be manually defined; for example, the task description information could be "Help me find the highest-rated restaurant within 3 kilometers and what its rating is."
[0085] In the above scheme, the page routing transition graph is generated by parsing the navigation configuration file, realizing the visualization and structuring of the page jump logic in the virtual simulation environment. On this basis, the reference task trajectory is determined by path search in the page routing transition graph, and reference task description information is generated according to the reference task trajectory. This realizes the batch automatic generation of task description information without manual writing, which not only reduces the cost of task construction and maintenance, but also significantly improves the generation efficiency of task description information. At the same time, it also ensures the standardization and diversity of task description information, providing rich task materials for the training of intelligent agents in diverse scenarios, and further improving the accuracy of intelligent agents in multi-scenario task execution.
[0086] In one possible embodiment, the agent training and evaluation method provided by this invention can be applied to a virtual simulator based on front-end code. For example... Figure 3 As shown, the virtual simulator 300 may include a virtual system shell module 301, an application logic modeling module 302, an action receiving and atomic gesture injection module 303, a task state setting and environment reset module 304, and a task automation evaluation module 305. After the virtual simulator 300 is connected to the intelligent agent 200, the intelligent agent 200 can be trained and evaluated.
[0087] Among them, the virtual system shell module 301 is used to simulate the operating system-level interface and behavior of mobile devices (such as desktop, status bar, multitasking switching and system return gestures) in the browser, and is responsible for the life cycle management of virtual applications such as startup and shutdown, providing the intelligent agent with an operation boundary consistent with the real mobile device.
[0088] The application logic modeling module 302 defines all page routes, page states, and jump rules between components within the application through navigation configuration files, and provides configuration data for all environmental data in the virtual simulation environment.
[0089] The action receiving and atomic gesture injection module 303 is responsible for receiving the original interaction instructions (such as click coordinates, swipe direction, etc.) output by the intelligent agent based on visual observation, and triggering the corresponding jump or state change defined in the application logic modeling module to ensure the physical determinism of the interaction feedback.
[0090] The task status setting and environment reset module 304 supports directly injecting initial data (such as preset account balances, chat logs, etc.) into the environment data configuration to support different task requirements. In addition, it also supports immediately restoring the system status and application data to the initial snapshot (basic initial state) after the task is completed, achieving a second-level environment reset.
[0091] The task automation evaluation module 305 incorporates a deterministic verifier 306. Instead of relying on the agent's operational process, the task automation evaluation module 305 directly retrieves the background truth value from the application logic modeling module 302 to determine whether the task has been ultimately achieved. This method eliminates the illusion problem associated with visual judgments and provides a high signal-to-noise ratio reward signal for reinforcement learning.
[0092] Taking the task description as: Enter the chat information page of a certain session and switch "Do Not Disturb" from off to on. The condition for successful task execution is that the "Do Not Disturb" switch changes from off to on in the truth value state corresponding to that session.
[0093] First, before evaluation, the virtual simulation environment is initialized, and the system service parameters and application initial parameters are configured according to the task to ensure that the starting point of the task is consistent.
[0094] The agent's observations primarily consist of virtual interactive pages and task description information, with its output being the type of atomic action and its corresponding parameters (e.g., click coordinates). Upon receiving this atomic action, it executes it via atomic gesture injection, thereby toggling the "Do Not Disturb" message switch.
[0095] After the action is executed, the deterministic validator 306 determines whether the task was successful based on the truth state, thereby reducing the "evaluation illusion" caused by relying on language model judgment or pixel matching. The task is considered successful if it meets the success conditions; otherwise, it is considered a failure. Simultaneously, the trajectory, screenshots, state summaries, time consumption, and steps are recorded, and metrics such as success rate are calculated for horizontal comparison of tasks.
[0096] In the above solution, a virtual simulation environment based on front-end code is built within the browser, making this virtual simulation environment programmable. That is, the entire state of the virtual simulation environment (the state of the virtual operating system and virtual applications) is a transparent, readable, writable, and serializable state tree. Any initial state can be freely defined (account balance, orders, chat history, system settings, etc.), and any task scenario can be freely constructed (including high-risk operations such as payments and account cancellations that cannot be performed on a real device), with task results directly verified through code.
[0097] Furthermore, because virtual simulation environments can be built in a browser, they can support high concurrency, low cost, and reproducible evaluation and training, reducing the environmental differences and uncertainties brought about by real machines / traditional simulators. By providing an injection interface for atomic gestures (atomic actions), it ensures that the capability boundaries of intelligent agents are consistent with human operations, facilitating cross-task / cross-application comparisons.
[0098] Furthermore, by employing declarative navigation configuration and discrete page state modeling, the page structure can be statically analyzed and page routing transition graphs can be generated, thereby enabling automated task construction. Here, the state of discrete pages can include different display modes determined by a finite set of parameters under the same page route.
[0099] In some application scenarios, a unified interactive identifier system and static consistency verification mechanism can be established to bind "declaration-code-task" to reduce the risk of the baseline environment becoming invalid with page updates. Here, the declaration is the navigation configuration file, the code is the front-end code, and the task can be a task generated from the reference task trajectory or a manually written task. The three are strongly bound by the same unique identifier, and a static consistency verification mechanism ensures their consistency. Among them, the declaration layer is the single source of fact, the navigation graph and tasks are automatically derived products, and consistency verification ensures that the code layer is synchronized with the declaration layer.
[0100] In some application scenarios, the virtual simulator 300 can continuously expand the set of tasks and applications, and then maintain long-term availability and comparability through versioning and regression mechanisms. That is, agents evaluated at different times use the same version of the baseline environment (the same version of the virtual application and the same set of tasks), thus allowing for fair comparison of evaluation results. Furthermore, when the virtual application's pages are iteratively updated (e.g., adding features or modifying pages), the evaluation results of the old version are still retained and meaningful after the release of the new version of the baseline environment. The regression mechanism allows for regression verification on the same task set before the new version is released, ensuring that old tasks remain usable and judgments are still correct.
[0101] In some application scenarios, virtual simulators can be expanded with configurations of virtual operating systems for multiple devices and evaluation matrices for multiple themes / resolutions to test the cross-pattern robustness of agents.
[0102] In some application scenarios, virtual simulators can provide page injection interfaces, which can inject interfaces such as pop-up ads, network errors, and application malfunctions to test the abnormal handling capabilities of intelligent agents.
[0103] The training and evaluation method for intelligent agents provided by this invention can exist in the form of training and evaluation platform software, such as a toolchain that integrates "a collection of virtual operating systems built in a browser + simulation applications + evaluation framework", which can be used for training and evaluation of intelligent agents on mobile devices.
[0104] Alternatively, the training and evaluation method for intelligent agents provided by this invention can also exist in the form of an SDK / middleware, for example, providing capabilities such as atomic gesture injection, virtual simulation environment state observation / writing, system service control (time / location / network / keyboard), task loading and judgment in the form of an interface, for integration by third-party task frameworks or training frameworks.
[0105] Alternatively, the training and evaluation method for intelligent agents provided by this invention can also exist in the form of a benchmark evaluation service, such as providing a fixed version of a virtual operating system and virtual applications, a set of tasks and a log / statistics pipeline.
[0106] The training and evaluation apparatus for intelligent agents provided by this invention is described below. The training and evaluation apparatus for intelligent agents described below can be referred to in correspondence with the training and evaluation method for intelligent agents described above. For example... Figure 4 As shown, the training and evaluation device 400 for intelligent agents includes: Initialization unit 401 is used to initialize the state and task description information of the virtual simulation environment defined by the front-end code, wherein the virtual simulation environment includes a virtual operating system and virtual applications. The page rendering unit 402 is used to generate a virtual interactive page according to the state of the virtual operating system and the state of the virtual application; The training and evaluation unit 403 is used to train or evaluate the intelligent agent using the virtual interactive page and the task description information.
[0107] According to the present invention, a training and evaluation device 400 for an intelligent agent is provided, wherein a training and evaluation unit 403 uses the virtual interactive page and the task description information to train or evaluate the intelligent agent, including: The virtual interactive page and the task description information are used as inputs to the agent to obtain the atomic actions output by the agent. Based on the atomic actions, the state of the virtual operating system and / or the state of the virtual application are updated; Determine whether the preset interaction stop condition is met; If so, the task execution results of the agent are evaluated based on the updated state of the virtual operating system and / or the state of the virtual application to obtain the evaluation results of the agent.
[0108] According to the present invention, a training and evaluation device 400 for an intelligent agent is provided, wherein the training and evaluation unit 403 is further configured to: If the interaction stopping condition is not met, perform iterative interaction operations until the interaction stopping condition is met. The iterative interaction operations include: The virtual interactive interface generated based on the updated virtual operating system state and / or virtual application state is used as the input of the agent to obtain the atomic action output by the agent in this interaction; Based on the atomic actions output by the agent in this interaction, the state of the virtual operating system and / or the state of the virtual application are updated.
[0109] According to the present invention, a training and evaluation device 400 for an intelligent agent includes a training and evaluation unit 403 that evaluates the task execution results of the intelligent agent based on the updated state of the virtual operating system and / or the state of the virtual application, to obtain the evaluation results of the intelligent agent, including: Call the preset structured state interface to obtain the state tree of the virtual operating system and / or virtual application; Based on the state tree of the virtual operating system and / or virtual application, the task execution results of the agent are evaluated to obtain the evaluation results of the agent.
[0110] According to the present invention, a training and evaluation device 400 for an intelligent agent includes a training and evaluation unit 403 that evaluates the task execution results of the intelligent agent based on the state tree of the virtual operating system and / or virtual application, and obtains the evaluation results of the intelligent agent, including: Compare the values in at least a portion of the key-value pairs in the state tree of the virtual operating system and / or virtual application with their corresponding target values; If the comparison result is a complete match, the evaluation result of the agent is confirmed as the agent's task execution was successful; or, if the comparison result is a partial mismatch, the evaluation result of the agent is confirmed as the agent's task execution failed.
[0111] According to the present invention, a training and evaluation device 400 for an intelligent agent includes a training and evaluation unit 403 that updates the state of the virtual operating system and / or the state of the virtual application based on the atomic actions, including: Query the preset navigation configuration file to determine the jump rules for all interactive components in the virtual interactive page; Identify the target interactive component corresponding to the atomic action in the virtual interactive page; Based on the action type in the atomic action and the jump rule of the target interactive component, the value of the current page route in the state tree is determined, and the value of the current page route is used to indicate the virtual interactive page after the jump.
[0112] According to the present invention, a training and evaluation device 400 for an intelligent agent is provided, wherein the training and evaluation unit 403 is further configured to: When the atomic action carries input data, the corresponding business data in the virtual interactive page after the jump in the state tree is updated using the input data.
[0113] According to the present invention, a training and evaluation device 400 for an intelligent agent includes an initialization unit 401 that initializes the state of a virtual simulation environment defined by front-end code, comprising: The state of the virtual operating system and the virtual application is initialized to a basic initial state; Based on the task requirements corresponding to the task description information, the general state of the virtual operating system and / or the virtual application is adjusted to obtain the state corresponding to the task requirements.
[0114] According to the training and evaluation device 400 for an intelligent agent provided by the present invention, the initialization unit 401 is further configured to: Parse the preset navigation configuration file and generate a page route transition diagram; Path search is performed on the page routing graph to determine several reference task trajectories; Based on each of the aforementioned reference task trajectories, several reference task description information are generated, and each of the aforementioned reference task description information can be used as the task description information.
[0115] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5 As shown, the electronic device may include a processor 510, a communications interface 520, a memory 530, and a communication bus 540, wherein the processor 510, communications interface 520, and memory 530 communicate with each other via the communication bus 540. The processor 510 can call logical instructions in the memory 530 to execute a training and evaluation method for an intelligent agent. This method includes: initializing the state and task description information of a virtual simulation environment defined by front-end code, wherein the virtual simulation environment includes a virtual operating system and virtual applications; generating a virtual interactive page according to the state of the virtual operating system and the state of the virtual applications; and training or evaluating the intelligent agent using the virtual interactive page and the task description information.
[0116] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0117] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the training and evaluation methods for intelligent agents provided by the above methods. The method includes: initializing the state and task description information of a virtual simulation environment defined by front-end code, wherein the virtual simulation environment includes a virtual operating system and a virtual application; generating a virtual interactive page according to the state of the virtual operating system and the state of the virtual application; and training or evaluating the intelligent agent using the virtual interactive page and the task description information.
[0118] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a training and evaluation method for an intelligent agent provided by the methods described above. This method includes: initializing the state and task description information of a virtual simulation environment defined by front-end code, the virtual simulation environment including a virtual operating system and virtual applications; generating a virtual interactive page according to the state of the virtual operating system and the state of the virtual applications; and training or evaluating the intelligent agent using the virtual interactive page and the task description information.
[0119] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0120] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0121] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for training and evaluating an intelligent agent, characterized in that, include: Initialize the state and task description information of the virtual simulation environment defined by the front-end code, the virtual simulation environment including a virtual operating system and virtual applications; Generate a virtual interactive page based on the state of the virtual operating system and the state of the virtual application; The intelligent agent is trained or evaluated using the virtual interactive page and the task description information.
2. The method according to claim 1, characterized in that, Using the virtual interactive page and the task description information, the agent is trained or evaluated, including: The virtual interactive page and the task description information are used as inputs to the agent to obtain the atomic actions output by the agent. Based on the atomic actions, the state of the virtual operating system and / or the state of the virtual application are updated; Determine whether the preset interaction stop condition is met; If so, the task execution results of the agent are evaluated based on the updated state of the virtual operating system and / or the state of the virtual application to obtain the evaluation results of the agent.
3. The method according to claim 2, characterized in that, The method further includes: If the interaction stopping condition is not met, perform iterative interaction operations until the interaction stopping condition is met. The iterative interaction operations include: The virtual interactive interface generated based on the updated virtual operating system state and / or virtual application state is used as the input of the agent to obtain the atomic action output by the agent in this interaction; Based on the atomic actions output by the agent in this interaction, the state of the virtual operating system and / or the state of the virtual application are updated.
4. The method according to claim 2, characterized in that, Based on the updated state of the virtual operating system and / or the state of the virtual application, the task execution results of the agent are evaluated to obtain the evaluation results of the agent, including: Call the preset structured state interface to obtain the state tree of the virtual operating system and / or virtual application; Based on the state tree of the virtual operating system and / or virtual application, the task execution results of the agent are evaluated to obtain the evaluation results of the agent.
5. The method according to claim 4, characterized in that, Based on the state tree of the virtual operating system and / or virtual application, the task execution results of the agent are evaluated to obtain the evaluation results of the agent, including: Compare the values in at least a portion of the key-value pairs in the state tree of the virtual operating system and / or virtual application with their corresponding target values; If the comparison result is a complete match, the evaluation result of the agent is confirmed as the agent's task execution was successful; or, if the comparison result is a partial mismatch, the evaluation result of the agent is confirmed as the agent's task execution failed.
6. The method according to claim 4, characterized in that, Based on the atomic action, updating the state of the virtual operating system and / or the state of the virtual application includes: Query the preset navigation configuration file to determine the jump rules for all interactive components in the virtual interactive page; Identify the target interactive component corresponding to the atomic action in the virtual interactive page; Based on the action type in the atomic action and the jump rule of the target interactive component, the value of the current page route in the state tree is determined, and the value of the current page route is used to indicate the virtual interactive page after the jump.
7. The method according to claim 6, characterized in that, The method further includes: When the atomic action carries input data, the corresponding business data in the virtual interactive page after the jump in the state tree is updated using the input data.
8. The method according to any one of claims 1 to 7, characterized in that, Initialize the state of the virtual simulation environment defined by the front-end code, including: The state of the virtual operating system and the virtual application is initialized to a basic initial state; Based on the task requirements corresponding to the task description information, the general state of the virtual operating system and / or the virtual application is adjusted to obtain the state corresponding to the task requirements.
9. The method according to any one of claims 1 to 7, characterized in that, The method further includes: Parse the preset navigation configuration file and generate a page route transition diagram; Path search is performed on the page routing graph to determine several reference task trajectories; Based on each of the reference task trajectories, several reference task description information are generated, and each of the reference task description information can be used as the task description information.
10. A training and evaluation device for an intelligent agent, characterized in that, include: An initialization unit is used to initialize the state and task description information of the virtual simulation environment defined by the front-end code, wherein the virtual simulation environment includes a virtual operating system and virtual applications; The page rendering unit is used to generate a virtual interactive page according to the state of the virtual operating system and the state of the virtual application; The training and evaluation unit is used to train or evaluate the agent using the virtual interactive page and the task description information.
11. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the training and evaluation method for the intelligent agent as described in any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the training and evaluation method for the intelligent agent as described in any one of claims 1 to 9.