Method, device and equipment for artificial intelligence agent to interact with webpage and storage medium
By reading AI manifest files and component-level AI contracts, a target website operation plan is generated, which solves various shortcomings of existing AI agents in web page interaction and achieves efficient, stable and secure web page operation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
Smart Images

Figure CN122196252A_ABST
Abstract
Description
Technical Field
[0001] The embodiments of the present invention relate to the fields of World Wide Web front-end technology and human-computer interaction technology, and particularly to an artificial intelligence agent and web page interaction method, device, equipment and storage medium. Background Technology
[0002] In the digital age, web pages have become a core carrier of information and services. Artificial intelligence (AI) agents are intelligent programs capable of perceiving their environment, autonomously planning, and executing tasks to achieve goals. For example, AI agents are widely used in the financial and medical fields to facilitate web page interactions in their respective areas. When an AI agent interacts with a web page, it transforms into an automated, intelligent "virtual operator." It can not only browse pages, click links, and fill out forms like a human, but also deeply understand the content and structure of the web page, completing a series of tasks from accurate information extraction and automatic data summarization to the automated execution of complex processes. This liberates users from repetitive and tedious online operations, opening up new possibilities for intelligent interaction.
[0003] However, due to the shortcomings of current website design, existing AI agents for web page tasks typically have the following limitations: 1. For tasks that are “readable in content” but not “executable in task”, the lack of contracts that can be directly invoked by AI agents, such as the operation intent of forms / buttons, preconditions, poststates, and failure modes, makes it difficult to execute such tasks accurately.
[0004] 2. The information presented to AI agents by web pages is mostly static semantics, with insufficient description of dynamic page states, such as pop-ups, virtual scrolling, permission thresholds, and asynchronous verification, making it difficult for AI agents to understand the behaviors supported by web pages.
[0005] 3. There is a lack of versioning, capability negotiation, and rate limiting interaction between AI agents and web pages.
[0006] 4. AI agents heavily rely on DOM selectors for content recognition and crawling. When web pages are redesigned, content recognition and crawling may fail, resulting in high maintenance costs.
[0007] 5. AI agents cannot adequately understand the business intent of each node on a webpage, requiring additional script information to be input manually to assist the agent in making inferences.
[0008] 6. The lack of standardized expressions for compliance and privacy boundaries makes AI agents susceptible to unauthorized operations and data leaks.
[0009] 7. The AI agent cannot accurately map UI components to business process diagrams, making it difficult to accurately associate page content with callable APIs, resulting in operation lag or failure. Summary of the Invention
[0010] To address the aforementioned technical problems, embodiments of the present invention provide an AI agent and webpage interaction method, applied to an AI agent system, the method comprising: In response to user commands, determine the target website to be interacted with; Read the AI manifest file, which contains usage information of the target website. The usage information is used to provide the AI agent system with information on the behaviors supported by the target website and the website performance of the target website. Load the component-level AI contract, which contains the purpose information of each front-end component of the target website; Based on the user instructions, AI manifest file, and component-level AI contract, the set of executable behaviors of the AI agent system on the target website is determined; By combining the AI manifest file and the set of executable behaviors, a target website operation plan is generated for the user's instructions; If the target website operation plan meets the execution requirements, the target website operation plan is executed on the target website to obtain the execution result.
[0011] In one embodiment, reading the AI manifest file includes: Read the website's public actions, data access constraints, component capability summaries, and website security policies from the AI manifest file; The loaded component-level AI contract includes: Load component-level AI contracts to determine the functional intent, state, input and output data, and interaction logic between different front-end components.
[0012] In one embodiment, determining the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract includes: Based on the AI manifest file and component-level AI contracts, executable candidate behaviors are determined; Generate a structured prompt header corresponding to the user instruction based on the user instruction and candidate behaviors; All executable behaviors are determined by negotiating the structured prompt header and candidate behaviors, and these executable behaviors form the executable behavior set.
[0013] In one embodiment, the method further includes: Based on the state changes of the target website, runtime information for the pages of the target website is generated. The runtime information includes information on visible components in the current page, page context information, and differential changes of the page.
[0014] In one embodiment, generating a target website operation plan for the user instruction by combining the AI manifest file and the set of executable behaviors includes: The current state of the target website is determined based on the runtime information; Based on the AI manifest file, determine the website behaviors and website performance supported by the target website; Based on the current state, supported website behaviors and website performance, and the set of executable behaviors, a target website operation plan is generated for the user's instructions.
[0015] In one embodiment, the method further includes: In response to the user instruction execution process involving sensitive operations, the AI agent system performs permission checks and user authorization information queries for the sensitive operations. If it is determined that the AI agent system has the necessary operating permissions and user authorization information, the AI agent system is allowed to perform the sensitive operation.
[0016] In one embodiment, the method further includes: In response to obtaining the execution result, determine the degree of matching between the execution result and the user instruction; The system records failed actions and error pages generated during the execution of the target website's operation plan. Obtain user feedback on the performance results; The structured prompt header and set of executable behaviors are optimized by combining the matching degree, failure actions, error pages, and satisfaction scores.
[0017] Another embodiment of the present invention also provides an artificial intelligence agent and web page interaction device, applied to an AI agent system, the device comprising: The first determination module is used to determine the target website to be interacted with in response to user instructions; A reading module is used to read an AI manifest file, which contains usage information of the target website. This usage information is provided to the AI agent system so that it can understand the behaviors supported by the target website and the website performance of the target website. A loading module is used to load component-level AI contracts, which contain usage information for each front-end component of the target website; The second determining module is used to determine the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract. The generation module is used to combine the AI manifest file and the set of executable behaviors to generate a target website operation plan for the user instructions; The execution module is used to execute the target website operation plan on the target website and obtain the execution result if the target website operation plan meets the execution requirements.
[0018] Another embodiment of the present invention also provides an electronic device, comprising: One or more processors; Memory, configured to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the AI agent and webpage interaction query method as described above.
[0019] Another embodiment of the present invention provides a storage medium having a computer program stored thereon, which, when executed by a processor, implements the artificial intelligence agent and web page interaction method as described above.
[0020] Based on the solutions disclosed in the above embodiments, it is evident that the method described in this embodiment can significantly improve the success rate and stability of AI, making the process intent clear, the status definite, and the failure recoverable, reducing "random clicks / fills". It significantly reduces maintenance costs, eliminating the need to re-record scripts and selectors; only minor adjustments are required at the contract layer. Furthermore, the overall solution is secure and compliant, achieving the effects of configuring a protection network for dangerous actions, minimizing data, and ensuring auditable traceability. In addition, it is consistent across platforms; changes in UI formats such as mobile / desktop / mini-programs do not affect the AI agent system's understanding of tasks, making it widely applicable. This embodiment's solution is compatible with existing SEO / accessibility solutions and will not disrupt the original ecosystem and performance budget of the AI system.
[0021] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings.
[0022] The technical solution of this application will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description
[0023] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0024] Figure 1 This is a flowchart illustrating the AI agent and webpage interaction method in an embodiment of the present invention.
[0025] Figure 2 This is a flowchart illustrating the AI agent and webpage interaction method in another embodiment of the present invention.
[0026] Figure 3 This is a flowchart illustrating the AI agent and webpage interaction method in another embodiment of the present invention.
[0027] Figure 4 This is a flowchart illustrating the AI agent and webpage interaction method in an embodiment of the present invention. Detailed Implementation
[0028] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings, but these are not intended to limit the scope of the invention.
[0029] It should be understood that various modifications can be made to the embodiments disclosed herein. Therefore, the following description should not be considered as limiting, but merely as an example of embodiments. Other modifications within the scope of this disclosure will be apparent to those skilled in the art.
[0030] The accompanying drawings, which are included in and form part of this specification, illustrate embodiments of the present disclosure and, together with the general description of the disclosure given above and the detailed description of the embodiments given below, serve to explain the principles of the disclosure.
[0031] These and other features of the invention will become apparent from the following description of preferred forms of embodiments given as non-limiting examples, with reference to the accompanying drawings.
[0032] It should also be understood that although the invention has been described with reference to some specific examples, those skilled in the art can certainly implement many other equivalent forms of the invention, which have the features described in the claims and are therefore all within the scope of protection defined herein.
[0033] The above and other aspects, features and advantages of this disclosure will become more apparent when taken in conjunction with the accompanying drawings and in view of the following detailed description.
[0034] Specific embodiments of the present disclosure are described thereafter with reference to the accompanying drawings; however, it should be understood that the disclosed embodiments are merely examples of the present disclosure and can be implemented in various ways. Well-known and / or repeated functions and structures are not described in detail to avoid unnecessary or redundant details that could obscure the present disclosure. Therefore, the specific structural and functional details disclosed herein are not intended to be limiting, but merely to serve as the basis and representative basis for the claims to teach those skilled in the art to use the present disclosure in a variety of substantially any suitable detailed structures.
[0035] This specification may use the phrases “in one embodiment,” “in another embodiment,” “in yet another embodiment,” or “in still another embodiment,” all of which may refer to one or more of the same or different embodiments according to this disclosure.
[0036] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0037] like Figure 1 As shown, this embodiment of the invention provides a method for AI agent interaction with web pages, applied to an AI agent system. The method includes: S1: Responding to user instructions, determine the target website to be interacted with; S2: Read the AI manifest file, which contains usage information of the target website. The usage information is provided to the AI agent system to enable it to understand the behaviors supported by the target website and the website performance of the target website. S3: Load the component-level AI contract, which contains the purpose information of each front-end component of the target website; S4: Determine the set of executable behaviors of the AI agent system on the target website based on the user instructions, AI manifest file and component-level AI contract; S5: Combine the AI manifest file and the set of executable behaviors to generate a target website operation plan for the user's instructions; S6: If the target website operation plan meets the execution requirements, execute the target website operation plan on the target website and obtain the execution result.
[0038] The method described in this embodiment is applied to an AI agent system, and its application field is not limited. It can be applied to fields such as finance and healthcare. For example, it can instruct the AI agent system to perform data query services, including insurance policy inquiries and drug inquiries, or to purchase financial and medical products. Specifically, users can instruct the AI agent system to periodically renew insurance policies on insurance platforms based on historical operation records, upload physiological monitoring data for users on medical platforms, purchase long-term medications, or proactively interact with corresponding web pages on medical platforms based on the user's recent physical condition input to search for relevant medical information, and even interact with virtual doctors online. In the education field, the AI agent system can automatically collect new educational information for users and purchase student workbooks, etc. The solution in this embodiment can be used for the release of AI-readable and controllable web page semantics and interaction contracts. This embodiment integrates "AI Manifest + Component Contract + Runtime State Flow + Capability Negotiation + Security and Compliance" into an integrated solution, enabling the AI agent system to stably understand the page structure, front-end component intent and executable actions of the web page without being fragile or exceeding its authority. It can also adaptively adjust to changes, making it highly flexible, widely applicable, and capable of efficiently executing web page agent services.
[0039] Specifically, in one embodiment, the AI manifest file is a pre-defined file. It can be obtained by the system through analysis and organization of historical data, obtained from a webpage through interaction with the webpage, or even a manually compiled and input file; the specific method of acquisition is not unique. In this embodiment, the AI manifest file includes publicly available website actions, data access constraints, component capability summaries, and the website's security policies. For example, the AI manifest file is a published page-level knowledge graph and task catalog, containing: page entities, executable actions and parameter patterns (JSON Schema), pre-constraints, success / failure post-states, rate and permission requirements, version number, and contact and appeal entry points. Furthermore, this manifest supports representation in JSON-LD / Extensible Self-Descriptive format and supports multiple languages, grayscale, and A / B versions.
[0040] The component-level AI contract is a pre-negotiated and configured contract between the system and the web page. This contract includes the functional intent, state, input and output data of the front-end components, and the interaction logic between different front-end components. For example, the component-level AI contract can be represented at the component level as a contract fragment attached to data attributes (such as `data-ai-role="address-form"`, `data-ai-action="submit"`) or the Shadow DOM. This fragment includes: field semantics, validation rules, masking strategies (PII / sensitive fields), visibility / interactivity state machine, error codes, and recovery instructions. Furthermore, the component-level AI contract in this embodiment can also provide selector stabilization and an intent selector to shield against vulnerabilities caused by visual / structural fine-tuning.
[0041] By reading the AI manifest file, the system presents the target website's behaviors in a structured data format to the AI agent system. Therefore, through the manifest file, the AI agent system can better understand the target website's performance, and thus its business capabilities. Furthermore, by loading component-level AI contracts, the agent system can learn about the semantic contracts pre-attached to each front-end component on the webpage. In other words, by reading and understanding the contracts, the agent system can comprehend the semantic purpose of the webpage's UI, i.e., the purpose of each front-end component, rather than simply guessing the component's meaning from its displayed content (icons). This lays the foundation for the system's subsequent use of front-end components.
[0042] Furthermore, determining the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract includes: S401: Determine executable candidate behaviors based on the AI manifest file and component-level AI contracts; S402: Generate a structured prompt header corresponding to the user instruction based on the user instruction and candidate behaviors; S403: Combine the structured prompt header and candidate behaviors to negotiate and determine all executable behaviors, which form the executable behavior set.
[0043] In this embodiment, the system first filters candidate behaviors that the AI agent system can execute on the target website based on the content in the AI manifest file and component-level AI contract. This step is to filter out non-executable behaviors and retain the behaviors that the agent system can execute. For example, first determine the behaviors supported by the website, and then filter them according to the permissions of the AI agent system to obtain candidate behaviors. Next, the system automatically generates structured prompt headers (request headers / response headers) based on the user command and candidate behaviors. Then, through the prompt headers (such as: vendor+capabilities, AI-Hints: actions=v2;state=on;rate=60 / m), it completes the negotiation of version / capabilities / frequency control / authorization scope in the first round of handshake, clearly defining the allowed / prohibited / confirmation-required behaviors, and thus determining the set of behaviors that the system can execute for the user command, that is, the set of executable behaviors, which matches the AI's operation permission scope.
[0044] In addition, the proxy system also supports a degradation strategy, which means that if the client does not support this standard, it will degrade to traditional accessibility / structured data.
[0045] In one embodiment, the method further includes: S7: Generate runtime information for the page of the target website based on the state changes of the target website. The runtime information includes information on visible components in the current page, page context information, and differential changes of the page.
[0046] In other words, in this embodiment, the system generates runtime information for the page whenever the website is launched or the webpage changes. This information (event stream) serves as an important reference for the system to plan its next action and can also provide references for latency and frequency limit configurations (governance parameters). When generating runtime information, read-only state stream endpoints can be exposed through EventSource / WebSocket, and key state changes can be reported using differential snapshots or semantic events (such as modal.opened, form.invalid, captcha.required), reducing the polling and re-rendering inference costs of the AI agent system.
[0047] Continue to combine Figure 2 As shown, the step of generating a target website operation plan for the user instruction by combining the AI manifest file and the set of executable behaviors includes: S501: Determine the current state of the target website based on the runtime information; S502: Determine the website behaviors and website performance supported by the target website based on the AI manifest file; S503: Combine the current state, supported website behaviors and website performance, and the set of executable behaviors to generate a target website operation plan for the user instruction.
[0048] The steps refer to the AI agent system automatically generating an action plan after understanding the website's capabilities and current operating status. This means generating a target website operation plan for the user's instructions. By executing this target website operation plan, the AI agent system can effectively and accurately complete the user's instructions and present the user with the required instruction processing results.
[0049] In another embodiment, such as Figure 3 As shown, to prevent the AI in the proxy system from directly manipulating the DOM or API of the target website, the method further includes: S8: In response to the sensitive operation involved in the execution of the user instruction, the AI agent system performs permission judgment and user authorization information query for the sensitive operation; S9: If it is determined that the AI agent system has operating permissions and user authorization information, the AI agent system is allowed to perform the sensitive operation.
[0050] For example, in this embodiment, after the target website operation plan is generated, it is not executed directly. Instead, it is handed over to another module of the system for verification and execution, such as the action execution proxy module. This module is different from the module (AI module) used to generate the target website operation plan. After obtaining the target website operation plan, the action execution proxy module verifies the legality of the actions and checks for unauthorized behavior. For example, dangerous operations (such as placing orders, transferring funds, and deleting) are uniformly executed in a controlled manner through the action proxy endpoint / ai / act. The executed operations may include, but are not limited to, mandatory explicit confirmation, CSRF / replay protection, context echoing, minimum necessary data transmission, and revocable tokens. At the same time, whitelisting and isolation sandbox policies are provided for operations such as uploading / downloading and third-party redirection. In addition, for sensitive actions involving payment, obtaining user personal information, uploading credentials, and modifying important data, the execution module performs double verification, including permission judgment and explicit user authorization, to ensure that the AI agent system cannot bypass privacy protection. After confirming that the operation plan is correct as a whole and that each operation is authorized, the operation plan is executed, and the execution result is pushed to the AI module of the agent system.
[0051] In practical applications, the above process can be implemented based on field-level masks, machine-readable compliance statements (such as regional traffic restrictions, destination restrictions, and retention periods), audit IDs, and operation traceability processes. It can also provide selective exposure and minimized data export, satisfying the interface agreements regarding data subject rights (export / deletion).
[0052] Furthermore, the method also includes: S10: In response to obtaining the execution result, determine the degree of matching between the execution result and the user instruction; S11: Statistically record failed actions and error pages generated during the execution of the target website operation plan; S12: Obtain user feedback on the satisfaction rating of the execution results; S13: Optimize the structured prompt header and set of executable behaviors by combining the matching degree, failed actions, error pages, and satisfaction scores.
[0053] In this embodiment, the agent system generates multi-dimensional signals, including AI execution accuracy, action failure statistics, user feedback ratings, and page error rates. It can also transmit anonymized indicators such as contract hit rate, distribution of operation failure reasons, step rollback rate, and ambiguous hotspots to guide contract iteration. Through the retrospective analysis of this information, the system can dynamically adjust the AI's structured prompts and risk-limited operations, adjust operational risks, and improve the planning quality of future agent tasks. This forms a self-evolving closed-loop system for the AI agent system, enabling its agent capabilities to continuously improve.
[0054] As can be seen from the solutions in the above embodiments, this embodiment upgrades from "content semantics" to a fusion of "interaction semantics + state machine + action contract," enabling it to not only tell the AI module "what this is," but also "what it can do, when it can do it, and what it will get after doing it." This allows the AI module to thoroughly understand the target website and its front-end components. In application, this effect can be achieved by combining a stable intent selector with contractual components. Based on this, the coupling of the DOM structure can be effectively weakened; even if the page is redesigned, the AI's operation path will not be disrupted. That is, even if the page is redesigned, the AI module can clearly know what to operate and achieve accurate operation.
[0055] Furthermore, by using runtime differential and event stream operations, semantic events can replace polling / screenshot inference, significantly reducing inference costs and error rates. Through capability negotiation, the initial handshake between the AI agent system and the target website can be standardized, supporting backward compatibility, canary deployments, and multi-client differentiation. The design of action agents and security monitoring allows for built-in confirmation, tokenization, auditing, and revocation of high-risk operations, reducing unauthorized / false triggering behaviors by the AI agent system. In application, it can be combined with, but is not limited to, field-level privacy / compliance statements, transforming compliance from documentation into machine-readable and executable policies. The built-in observability and evaluation processes enable closed-loop improvement between plan generation, execution, and evaluation, continuously enhancing the usability and robustness of the AI.
[0056] Based on the solutions disclosed in the above embodiments, it is evident that the method described in this embodiment can significantly improve the success rate and stability of AI, making the process intent clear, the status definite, and the failure recoverable, reducing "random clicks / fills". It significantly reduces maintenance costs, eliminating the need to re-record scripts and selectors; only minor adjustments are required at the contract layer. Furthermore, the overall solution is secure and compliant, achieving the effects of configuring a protection network for dangerous actions, minimizing data, and ensuring auditable traceability. In addition, it is consistent across platforms; changes in UI formats such as mobile / desktop / mini-programs do not affect the AI agent system's understanding of tasks, making it widely applicable. This embodiment's solution is compatible with existing SEO / accessibility solutions and will not disrupt the original ecosystem and performance budget of the AI system.
[0057] like Figure 4 As shown, another embodiment of the present invention provides an artificial intelligence agent and web page interaction device, applied to an AI agent system, the device comprising: The first determination module is used to determine the target website to be interacted with in response to user instructions; A reading module is used to read an AI manifest file, which contains usage information of the target website. This usage information is provided to the AI agent system so that it can understand the behaviors supported by the target website and the website performance of the target website. A loading module is used to load component-level AI contracts, which contain usage information for each front-end component of the target website; The second determining module is used to determine the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract. The first generation module is used to combine the AI manifest file and the set of executable behaviors to generate a target website operation plan for the user instructions; The first execution module is used to execute the target website operation plan on the target website and obtain the execution result if the target website operation plan meets the execution requirements.
[0058] In one embodiment, reading the AI manifest file includes: Read the website's public actions, data access constraints, component capability summaries, and website security policies from the AI manifest file; The loaded component-level AI contract includes: Load component-level AI contracts to determine the functional intent, state, input and output data, and interaction logic between different front-end components.
[0059] In one embodiment, determining the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract includes: Based on the AI manifest file and component-level AI contracts, executable candidate behaviors are determined; Generate a structured prompt header corresponding to the user instruction based on the user instruction and candidate behaviors; All executable behaviors are determined by negotiating the structured prompt header and candidate behaviors, and these executable behaviors form the executable behavior set.
[0060] In one embodiment, the device further includes: The second generation module is used to generate runtime information for the pages of the target website based on the state changes of the target website. The runtime information includes information on visible components in the current page, page context information, and differential changes of the page.
[0061] In one embodiment, generating a target website operation plan for the user instruction by combining the AI manifest file and the set of executable behaviors includes: The third determining module is used to determine the current state of the target website based on the runtime information; The fourth determination module is used to determine the website behaviors and website performance supported by the target website based on the AI manifest file; The third generation module is used to generate a target website operation plan for the user instruction by combining the current state, supported website behaviors and website performance, and the set of executable behaviors.
[0062] In one embodiment, the device further includes: The second execution module is used to respond to sensitive operations involved in the execution of the user instruction by performing permission judgment and user authorization information query of the AI agent system for the sensitive operations. The judgment module is used to allow the AI agent system to execute the sensitive operation if it determines that the AI agent system has the operation permission and has user authorization information.
[0063] In one embodiment, the device further includes: The fifth determining module is used to determine the degree of matching between the execution result and the user instruction in response to obtaining the execution result; The statistics module is used to collect statistics on failed actions and error pages generated during the execution of the target website's operation plan; The module is used to obtain user feedback on the satisfaction rating of the execution result; The optimization module is used to optimize the structured prompt header and the set of executable behaviors by combining the matching degree, failure actions, error pages, and satisfaction scores.
[0064] Another embodiment of the present invention also provides an electronic device, comprising: One or more processors; Memory, configured to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the AI agent and web page interaction method as described above.
[0065] Furthermore, one embodiment of the present invention also provides a storage medium storing a computer program, which, when executed by a processor, implements the artificial intelligence agent and webpage interaction query method described above. It should be understood that the various solutions in this embodiment have the corresponding technical effects in the above method embodiments, and will not be repeated here.
[0066] Furthermore, embodiments of the present invention also provide a computer program product, which is tangibly stored on a computer-readable medium and includes computer-readable instructions that, when executed, cause at least one processor to perform an AI agent and webpage interaction query method as described in the embodiments above.
[0067] It should be noted that the computer storage medium of the present invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access storage medium (RAM), a read-only storage medium (ROM), an erasable programmable read-only storage medium (EPROM or flash memory), an optical fiber, a portable compact disk read-only storage medium (CD-ROM), an optical storage medium, a magnetic storage medium, or any suitable combination thereof. In the present invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. In the present invention, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program configured for use by or in connection with an instruction execution system, system, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, antenna, optical fiber, RF, etc., or any suitable combination thereof.
[0068] Furthermore, those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.
[0069] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1One or more processes and / or boxes Figure 1 A system that specifies functions in one or more boxes.
[0070] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including an instruction set implemented in a process. Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0071] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of protection of this application is limited to these examples; within the framework of this application, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of one or more embodiments of this application as described above, which are not provided in detail for the sake of brevity.
Claims
1. A method for interaction between an artificial intelligence agent and a webpage, characterized in that, When applied to an AI agent system, the method includes: In response to user commands, determine the target website to be interacted with; Read the AI manifest file, which contains usage information of the target website. The usage information is used to provide the AI agent system with information on the behaviors supported by the target website and the website performance of the target website. Load the component-level AI contract, which contains the purpose information of each front-end component of the target website; The set of executable behaviors of the AI agent system on the target website is determined based on the user instructions, AI manifest file, and component-level AI contract; By combining the AI manifest file and the set of executable behaviors, a target website operation plan is generated for the user's instructions; If the target website operation plan meets the execution requirements, the target website operation plan is executed on the target website to obtain the execution result.
2. The AI agent and webpage interaction method according to claim 1, characterized in that, The reading of the AI manifest file includes: Read the website's public actions, data access constraints, component capability summaries, and website security policies from the AI manifest file; The loaded component-level AI contract includes: Load component-level AI contracts to determine the functional intent, state, input and output data, and interaction logic between different front-end components.
3. The method for AI agent and webpage interaction according to claim 1, characterized in that, The process of determining the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract includes: Based on the AI manifest file and component-level AI contracts, executable candidate behaviors are determined; Generate a structured prompt header corresponding to the user instruction based on the user instruction and candidate behaviors; All executable behaviors are determined by negotiating the structured prompt header and candidate behaviors, and these executable behaviors form the executable behavior set.
4. The method for AI agent and webpage interaction according to claim 1, characterized in that, The method further includes: Based on the state changes of the target website, runtime information for the pages of the target website is generated. The runtime information includes information on visible components in the current page, page context information, and differential changes of the page.
5. The AI agent and webpage interaction method according to claim 4, characterized in that, The process of generating a target website operation plan for the user instruction by combining the AI manifest file and the set of executable behaviors includes: The current state of the target website is determined based on the runtime information; Based on the AI manifest file, determine the website behaviors and website performance supported by the target website; Based on the current state, supported website behaviors and website performance, and the set of executable behaviors, a target website operation plan is generated for the user's instructions.
6. The method for AI agent and webpage interaction according to claim 1, characterized in that, The method further includes: In response to the user instruction execution process involving sensitive operations, the AI agent system performs permission checks and user authorization information queries for the sensitive operations. If it is determined that the AI agent system has the necessary operating permissions and user authorization information, the AI agent system is allowed to perform the sensitive operation.
7. The artificial intelligence agent and webpage interaction method according to claim 3, characterized in that, The method further includes: In response to obtaining the execution result, determine the degree of matching between the execution result and the user instruction; The system records failed actions and error pages generated during the execution of the target website's operation plan. Obtain user feedback on the performance results; The structured prompt header and set of executable behaviors are optimized by combining the matching degree, failure actions, error pages, and satisfaction scores.
8. An artificial intelligence agent and webpage interaction device, characterized in that, The device, used in an AI agent system, includes: The first determination module is used to determine the target website to be interacted with in response to user instructions; A reading module is used to read an AI manifest file, which contains usage information of the target website. This usage information is provided to the AI agent system so that it can understand the behaviors supported by the target website and the website performance of the target website. A loading module is used to load component-level AI contracts, which contain usage information for each front-end component of the target website; The second determining module is used to determine the set of executable behaviors of the AI agent system on the target website based on the user instructions, the AI manifest file, and the component-level AI contract. The first generation module is used to combine the AI manifest file and the set of executable behaviors to generate a target website operation plan for the user instructions; The first execution module is used to execute the target website operation plan on the target website and obtain the execution result if the target website operation plan meets the execution requirements.
9. An electronic device, characterized in that, include: One or more processors; Memory, configured to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the AI agent and webpage interaction query method as described in any one of claims 1-7.
10. A storage medium having a computer program stored thereon, which, when executed by a processor, implements the artificial intelligence agent and web page interaction method as described in any one of claims 1-7.