Interaction method and apparatus, electronic device, storage medium, program product, and program

The intelligent agent automatically obtains and fills in relevant information by recognizing and understanding the fields to be filled in on web pages, solving the problem of users having to manually fill in information on web pages and achieving a more efficient and convenient filling experience.

WO2026123346A1PCT designated stage Publication Date: 2026-06-18BEIJING ZITIAO NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date
2024-12-13
Publication Date
2026-06-18

Smart Images

  • Figure CN2024139158_18062026_PF_FP_ABST
    Figure CN2024139158_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to the technical fields of artificial intelligence and computers, and relates to an interaction method and apparatus, an electronic device, a storage medium, a program product, and a program. The interaction method of the present disclosure is applied to an intelligent agent, and comprises: in response to a filling function of a webpage being triggered, determining one or more items to be filled in the webpage; on the basis of semantic information of description information corresponding to the one or more items, acquiring, from interaction information between a user and the intelligent agent, related information required for filling in the one or more items; on the basis of the related information, generating content to be entered corresponding to the one or more items; and filling said content into the one or more items, and displaying a filled-in webpage.
Need to check novelty before this filing date? Find Prior Art

Description

Interaction methods and apparatus, electronic devices, storage media, program products and programs Technical Field

[0001] This disclosure relates to the fields of artificial intelligence and computer technology, and in particular to an interaction method and apparatus, electronic device, storage medium, program product and program. Background Technology

[0002] When browsing web pages, users often encounter pages that require them to fill in information. In particular, due to the widespread use of computers, more and more forms requiring data entry are provided to users in the form of web forms. These forms may contain many input boxes and other form elements that users need to manually fill in. For example, for text input boxes, users need to manually enter text; for image upload boxes, users need to manually upload images, and so on. Summary of the Invention

[0003] According to some embodiments of this disclosure, an interaction method is provided, applied to an intelligent agent, comprising: in response to the triggering of a webpage's fill-in function, determining one or more fields to be filled in the webpage; obtaining relevant information required for filling in one or more fields from the interaction information between the user and the intelligent agent based on the semantic information of the description information corresponding to the one or more fields to be filled in; generating content to be filled in corresponding to the one or more fields to be filled in based on the relevant information; filling in the content to be filled in the one or more fields to be filled in, and displaying the filled webpage.

[0004] According to some other embodiments of this disclosure, an interactive device is provided, comprising: a determining module configured to determine one or more fields to be filled in the webpage in response to the triggering of a fill-in function on the webpage; an obtaining module configured to obtain relevant information required for filling in one or more fields from interaction information between the user and the interactive device based on semantic information of descriptive information corresponding to the one or more fields to be filled; a generating module configured to generate content to be filled in corresponding to the one or more fields to be filled based on the relevant information; and a display module configured to fill in the content to be filled in the one or more fields to be filled and display the filled webpage.

[0005] According to further embodiments of the present disclosure, an electronic device is provided, comprising: a memory; and a processor coupled to the memory, the processor being configured to execute an interaction method as described in any embodiment of the present disclosure based on instructions stored in the memory.

[0006] According to further embodiments of the present disclosure, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the interactive methods of any embodiment of the present disclosure.

[0007] According to some other embodiments of the present disclosure, a computer program is provided, comprising: instructions that, when executed by a processor, cause the processor to perform an interactive method according to any embodiment of the present disclosure.

[0008] According to further embodiments of the present disclosure, a computer program product is provided, including instructions that, when executed by a processor, implement the interactive methods of any embodiment of the present disclosure.

[0009] Other features, aspects, and advantages of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0010] Embodiments of this disclosure are described below with reference to the accompanying drawings. It should be understood that the drawings described below are merely illustrative of some embodiments of this disclosure and are not intended to limit the scope of this disclosure. In the drawings:

[0011] Figure 1 shows a flowchart illustrating the interaction methods of some embodiments of this disclosure;

[0012] Figure 2 shows a schematic diagram of the display interface of some embodiments of this disclosure;

[0013] Figure 3 shows a schematic diagram of the structure of an interactive device according to some embodiments of the present disclosure;

[0014] Figure 4 shows a schematic diagram of the structure of an electronic device according to some embodiments of the present disclosure;

[0015] Figure 5 shows a schematic diagram of the structure of an electronic device according to other embodiments of the present disclosure. Detailed Implementation

[0016] The technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. It should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein.

[0017] It should be understood that the various steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect. Unless otherwise specifically stated, the components and steps set forth in these embodiments should be construed as exemplary only and do not limit the scope of this disclosure.

[0018] As used in this disclosure, the term "comprising" and its variations are open-ended terms that include at least the following elements / features but do not exclude other elements / features, i.e., "including but not limited to". The term "based on" means "at least partially based on".

[0019] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0020] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data shall comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals shall be provided for users to choose to authorize or refuse.

[0021] It should be understood that this disclosure does not limit how the image to be applied / processed is obtained. In some embodiments of this disclosure, it can be obtained from a storage device, such as internal memory or external storage device. In other embodiments of this disclosure, a camera component can be invoked to capture an image. It should be noted that the acquired image can be a captured image or a frame from a captured video, and is not particularly limited to these.

[0022] In the context of this disclosure, "image" can refer to any of a variety of images, such as color images, grayscale images, etc. It should be noted that the type of image is not specifically limited in the context of this specification. Furthermore, an image can be any suitable image, such as a raw image obtained by a camera device, or an image from which specific processing has been performed, such as preliminary filtering, dealiasing, color adjustment, contrast adjustment, normalization, etc. It should be noted that preprocessing operations may also include other types of preprocessing operations known in the art, which will not be described in detail here.

[0023] The embodiments of this disclosure are described in detail below with reference to the accompanying drawings; however, this disclosure is not limited to these specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. Furthermore, in one or more embodiments, specific features, structures, or characteristics can be combined in any suitable manner that will be apparent to those skilled in the art from this disclosure.

[0024] With the widespread use of computers, web pages are increasingly used for information collection. These web pages often include fields for users to manually enter information, requiring the entered information to conform to specific formats or requirements. Furthermore, some web pages lack sufficient explanation for these fields, forcing users to repeatedly try and guess to determine the correct information. Users often find filling out these fields tedious and have a poor experience.

[0025] This disclosure provides an interaction method applicable to an intelligent agent. This method automatically identifies one or more fields to be filled in on a webpage. Based on the semantic information of the descriptions corresponding to these fields, it obtains the necessary information from interactions with the user and automatically generates the content to be filled in the fields. This interaction method provides users with a more flexible and convenient way to fill in webpages. The intelligent agent automatically fills in the fields by obtaining the necessary information through "chat" with the user. As a webpage filling assistant, the agent makes it easier for users to complete the task, improving efficiency. Based on the agent's extensive knowledge and semantic understanding capabilities, it can more accurately generate the content to be filled in without requiring the user to understand the requirements, thus improving the overall user experience.

[0026] The interaction method of this disclosure is described below with reference to Figures 1-2 and some embodiments. The intelligent agent of this disclosure can be implemented in software or a combination of software and hardware. When the intelligent agent is implemented in software, it can be an application program, etc., and is not limited to the examples given.

[0027] Figure 1 is a flowchart of some embodiments of the interactive method of this disclosure. As shown in Figure 1, the method of this embodiment includes steps S102 to S108. The method of this embodiment can be executed by an intelligent agent.

[0028] In step S102, in response to the webpage's fill-in function being triggered, one or more fields to be filled in on the webpage are identified.

[0029] The intelligent agent can possess various functions, such as interactive functions and webpage fill-in functions. When the webpage fill-in function is triggered, the intelligent agent can act as a webpage fill-in assistant to execute the interactive methods disclosed herein. For example, the intelligent agent includes a browser, which, in response to the user entering the webpage address in the browser, opens the webpage in the browser, thereby triggering the opening of the webpage fill-in function. Alternatively, it may display a control for the webpage fill-in function, which, in response to the user's triggering of the control, triggers the opening of the webpage fill-in function. The triggering methods for the webpage fill-in function can include various methods, not limited to the examples given, and will be further described in subsequent embodiments.

[0030] In response to the webpage's fill-in function being triggered, the intelligent agent can identify the content on the webpage to determine one or more fields to be filled in. These fields can also be called fields to be filled in, input fields, etc., and can take one or more forms such as text boxes, file upload boxes, drop-down menus, radio buttons, and checkboxes. Any part that requires user input or selection is considered a field to be filled in, not limited to the examples given.

[0031] In step S104, based on the semantic information of the description information corresponding to one or more fields to be filled, relevant information required to fill in one or more fields is obtained from the interaction information between the user and the intelligent agent.

[0032] The description information for each of one or more fields to be filled in may include at least one of the following: name, label information, and prompt information. For example, if a field to be filled in is a text input box, the text input box may display label information and prompt information. The label information may be used to briefly explain the type of content to be filled in, while the prompt information may be used to provide more detailed explanations of the requirements for the content to be filled in or to provide the user with prompts for filling in the field. For example, if the label information is "Review", the prompt information may be "Fill in your review of this service, within 100 words".

[0033] For example, an intelligent agent can correspond to a machine learning model. By calling the machine learning model, it can perform semantic understanding of the description information of each field to be filled in, determine the semantic information, and obtain the relevant information needed to fill in one or more fields based on the semantic information from the interaction information with the user. The interaction information can be historical interaction information or interaction information obtained by the intelligent agent after determining the description information of each field to be filled in through interaction with the user.

[0034] In step S106, based on the relevant information, one or more fields to be filled are generated.

[0035] The relevant information may be non-standard or not fully compliant with the filling requirements, but it is related to one or more fields to be filled in. The relevant information may be in one or more modalities (or types), such as text, voice, document, image, or video, and is not limited to the examples given. The agent can perform one or more processing operations on the relevant information, such as understanding, extracting, summarizing, expanding, or abstracting, to generate content that meets the filling requirements of one or more fields to be filled in.

[0036] In step S108, the content to be filled is entered into one or more fields, and the completed webpage is displayed.

[0037] An intelligent agent can interact with a webpage to fill in information into one or more fields and then display the completed webpage to the user. For example, if the intelligent agent includes browser functionality, the completed webpage can be displayed in the browser, and the filling process for each field can also be shown.

[0038] The interaction method described above provides users with a more flexible and convenient way to fill out web pages. The intelligent agent automatically fills in the fields on the web page for the user by obtaining the relevant information needed during a "chat" with the user. As a web page filling assistant, the intelligent agent makes it easier for the user to complete the task, improving efficiency. Based on the intelligent agent's extensive knowledge accumulation and semantic understanding capabilities, it can more accurately generate the content to be filled in, eliminating the need for the user to understand and think about the requirements, thus improving the overall user experience.

[0039] The foregoing embodiments have described some methods for triggering the fill-in function of a webpage. In addition, there are other methods for triggering the fill-in function of a webpage. The following embodiments describe how the fill-in function of a webpage is triggered and how to determine one or more fields to be filled in on the webpage. The methods in the following embodiments can be executed by an intelligent agent.

[0040] In some embodiments, in response to a user opening a webpage, the content of the webpage is identified, and in response to the identification that the webpage includes fields to be filled in, the field filling function of the webpage is triggered, and one or more fields to be filled in are determined.

[0041] For example, one can obtain the structural information of a webpage and then identify its content based on that information. Similarly, one can obtain an image of a webpage, perform image recognition, and thus identify the webpage's content. The structural information of a webpage can be, for example, the Document Object Model (DOM) structure. By recognizing the content of a webpage, one can determine whether it contains fields to be filled in. If so, the page's fill-in function is triggered, and one or more fields to be filled can be identified through content recognition.

[0042] The method described in the above embodiments allows users to automatically trigger the webpage's fill-in function simply by opening the webpage, which facilitates user use and improves the efficiency of webpage filling.

[0043] In some embodiments, in response to a user opening a webpage and entering a fill-in instruction, the system determines that the webpage's fill-in function has been triggered based on the semantic information of the fill-in instruction, identifies the content of the webpage, and determines one or more items to be filled in the webpage.

[0044] Users can input fill-in commands to instruct the agent to fill in information on a webpage, thereby triggering the webpage's fill-in function. For example, as shown in Figure 2, the display interface includes a webpage display area 201 and an interaction area 202. The webpage display area 201 can display the currently open webpage, which may include various forms and other content, including some fields to be filled in. The interaction area 202 can display the interaction information between the user and the agent. In the interaction area, the user can send a fill-in command 203, such as "Look at the content on the right, help me fill it in." Based on the semantic understanding of the fill-in command, the agent determines that the webpage's fill-in function has been triggered, thereby recognizing the content of the webpage and identifying one or more fields to be filled in.

[0045] The method described in the above embodiments allows users to trigger automatic filling of web pages with simple commands, which facilitates user use, improves the efficiency of web page filling, and better aligns with user intent.

[0046] In some embodiments, in response to a user opening a webpage, the content of the webpage is identified; in response to the identification that the webpage includes fields to be filled in, a fill control is displayed; in response to the user triggering the fill control, it is determined that the fill function of the webpage has been triggered, and one or more fields to be filled in are identified.

[0047] The system can first identify whether the webpage contains fields to be filled in. If so, it displays the fill-in controls, for example, as a floating layer on the current screen, but is not limited to the example given. In response to the triggering of the fill-in controls, one or more fields to be filled in can be further identified.

[0048] The method described in the above embodiments can display a fill control when a webpage requires information to be filled in. Users only need to trigger the fill control to activate the automatic filling function of the webpage, which facilitates user use, improves the efficiency of filling in webpages, and better meets the user's intentions.

[0049] In some embodiments, in response to a user opening a webpage, the content of the webpage is identified, and in response to the identification that the webpage includes fields to be filled in, it is determined whether the field filling function of the webpage is triggered based on the user's behavior data.

[0050] For example, user behavior data includes at least one of the following: the number of times the webpage's fill-in function was triggered or the probability of that triggering occurred during the user's historical interactions with the agent. For instance, if the number of times the webpage's fill-in function was triggered exceeds a threshold, or the probability exceeds a threshold, it is determined that the webpage's fill-in function has been triggered; otherwise, it is determined that the webpage's fill-in function has not been triggered. Alternatively, user behavior data can be used to determine whether to display a fill-in control. In response to the user's triggering of the fill-in control, it is determined that the webpage's fill-in function has been triggered, and one or more fields to be filled in are identified.

[0051] The methods for triggering the webpage's fill-in function are not limited to the examples mentioned above. For instance, the webpage's fill-in function can also be triggered in response to the user entering a URL and inputting a fill-in command, or in response to the user opening the webpage and displaying the fill-in control, or in response to the user's activation of the fill-in control, the webpage's fill-in function can be triggered.

[0052] The intelligent agent can identify the content of a webpage based on its structural information and / or its images.

[0053] In some embodiments, the document object model structure of the webpage is obtained, and one or more fields to be filled in and their description information are determined based on the document object model structure. The description information includes at least one of name, tag information, and prompt information.

[0054] One or more fields to be filled in and their descriptions can be directly determined from the DOM information or other types of structural information of a webpage. For example, identifying tags in the DOM information as... <input> The elements are used as fields to be filled in, and their descriptions can also be obtained. For details, please refer to existing technologies; further explanation is omitted here.

[0055] Identifying one or more fields to be filled based on the document object model structure of a webpage can improve processing efficiency and accuracy.

[0056] In some embodiments, an image of a webpage is acquired, the image of the webpage is identified, and one or more fields to be filled in and their descriptive information are determined.

[0057] Images obtained from web pages are obtained with prior user authorization. For example, an intelligent agent can correspond to a machine learning model, which can use the machine learning model to identify images on a web page and determine one or more fields to be filled in and their descriptive information.

[0058] When the structural information of a webpage is available, content identification can be prioritized based on this information to improve efficiency. Identifying the content based on the webpage's structural information yields a first result. Further identification of the webpage's image can then yield a second result, which is then matched. The first result can include one or more fields to be filled in, along with their descriptions, determined from the webpage's structural information. The second result can include one or more fields to be filled in, along with their descriptions, determined from the webpage's image. If the first and second results are identical, the identification is considered accurate, and one or more fields to be filled in and their descriptions can be obtained. If the number of fields to be filled in is the same in both results, but the descriptions differ, the first result takes precedence. If the second result includes more fields to be filled in than the first result, fields not present in the first result can be identified and merged with it. Combining these two methods improves the accuracy of identifying one or more fields to be filled in and their descriptions.

[0059] After the intelligent agent identifies one or more fields to be filled in on a webpage and their descriptions, it can further obtain relevant information needed to fill in the fields from the user's interaction with the intelligent agent, based on the semantic information of the descriptions corresponding to the fields. The following describes how to obtain this relevant information using some embodiments; the methods described in these embodiments can be executed by the intelligent agent.

[0060] In some embodiments, guidance information is generated based on semantic information of the description information of one or more fields to be filled, wherein the guidance information is used to guide the user to provide relevant information; the guidance information is displayed; in response to receiving information provided by the user based on the guidance information, it is determined whether the provided information is related to one or more fields to be filled, wherein the interaction information between the user and the agent includes the provided information; in response to the provided information being related to one or more fields to be filled, the provided information is determined to be relevant information.

[0061] The intelligent agent can guide users to provide relevant information through guidance information. For example, as shown in Figure 2, guidance information can be displayed in the interaction area 202. Guidance information can be generated based on the semantic information of the webpage content and the semantic information of the descriptions of one or more fields to be filled in. By combining the webpage content and the descriptions of one or more fields to be filled in, the filling requirements of the fields can be understood more accurately, thus generating more accurate guidance information.

[0062] Users can provide information in a more convenient way, such as uploading a document or briefly describing some relevant content, without having to strictly follow the requirements of the fields to be filled in.

[0063] The method described in the above embodiment adopts an interactive guidance approach, using guidance information to guide users to input relevant information. This eliminates the need for users to actively think about and understand the filling requirements and content of one or more fields to be filled, thereby improving filling efficiency and providing a better user experience.

[0064] In some embodiments, based on the semantic information of the description information of each of the one or more fields to be filled, the data type of the content to be filled corresponding to each field to be filled, the filling instructions, and at least one example are determined; and guidance information is generated based on the data type of the content to be filled corresponding to each field to be filled, the filling instructions, and at least one example.

[0065] By combining the semantic information of the webpage content with the semantic information of the descriptions of each field to be filled in, we can more accurately determine the data type, filling instructions, and at least one example for each field. For example, in addition to semantic understanding of the descriptions of each field, we can also perform semantic understanding of the webpage title, descriptions, and the context of each field. Since the prompts and tags for the fields to be filled in are generally brief, the generated filling instructions can be more detailed to provide users with more specific guidance, and combined with examples, to better guide users in providing relevant information.

[0066] The data type of the content to be filled in can be modal, such as text, document, image, video, etc., and can also include more specific formats, such as PDF (Portable Document Format), JPG (Joint Photographic Experts Group), etc., and can also include character types, such as integer, floating point, etc.

[0067] Guidance information can include instructions on relevant information, such as directly instructing the user on the required documents. For example, if the content to be filled in includes order information, the guidance information could be "Please provide me with a screenshot of your order." The modality and content of the relevant information provided by the user can be determined based on the semantic information of the description of each field to be filled in, in order to minimize user operations and generate appropriate guidance information.

[0068] The method described above generates guidance information including the data type of the content to be filled in for each field, filling instructions, and at least one example. This allows users to better understand the relevant information that needs to be provided, especially for some highly technical, complex, and difficult-to-understand fields. This can better assist users in understanding and improve filling efficiency.

[0069] In some embodiments, search terms are generated based on the semantic information of the webpage description and the semantic information of the description of each field to be filled in; a search is performed based on the search terms; and based on the search results, the data type of the content to be filled in, the filling instructions, and at least one of the examples corresponding to each field to be filled in are determined.

[0070] A webpage's description can include at least one of its title and explanatory information. A search can be used to obtain more information such as filling examples and instructions for each field to be filled in. Then, a machine learning model can be used to summarize the search results and generate guidance information. The search results can be in one or more modalities, such as text, images, and videos. Semantic understanding of this content can be performed to generate guidance information, which can also be in one or more modalities. Alternatively, machine learning models can be used directly to generate guidance information based on the semantic information of the webpage's description and the semantic information of each field to be filled in.

[0071] The method described in the above embodiment, by searching for relevant content for each field to be filled in, and then summarizing the search results to generate guidance information, can improve the accuracy of the guidance information and thus more accurately guide users to provide relevant information.

[0072] The information provided by the user can take one or more modalities, such as text, document, image, video, audio, etc.

[0073] In some embodiments, the modality of the provided information is identified; the provided information is parsed according to the modality to determine the semantic information of the provided information; and the provided information is determined to be related to one or more items to be filled based on the semantic information of the provided information and the semantic information of the description information of one or more items to be filled.

[0074] After a user provides information, it is necessary to identify the modality of that information. Based on the modality of the provided information, the corresponding machine learning model can be invoked to parse the information and determine its semantic information.

[0075] For example, if a user provides a form document for one or more fields to be filled in a form, the form document may include information related to one or more fields to be filled in, as well as information unrelated to one or more fields to be filled in, then an intelligent agent needs to automatically identify this.

[0076] The method described in the above embodiments can accurately identify the information provided by the user even when the information belongs to different modalities, and match it with one or more fields to be filled in, so that users can provide relevant information more conveniently and flexibly, and improve the efficiency of the entire filling process.

[0077] If user-provided information is used as relevant information for filling in one or more fields, the following method can be used to generate the content to be filled for one or more fields. In some embodiments, based on the semantic information of the provided information and the semantic information of the descriptive information of each of the one or more fields, the provided information is matched with each field to determine the information in the provided information that matches each field; based on the information in the provided information that matches each field, the content to be filled for each field is generated.

[0078] The information provided by the user may not differentiate between the information corresponding to each field to be filled in. For example, the user may provide a document containing information that matches one or more fields to be filled in, and may also include other information. The agent needs to match the information provided by the user with each field to be filled in, identify the information in the provided information that matches each field to be filled in, and then generate the content to be filled in for each field.

[0079] Based on providing users with more flexible and convenient ways to provide information, the method described above can generate more accurate content to be filled in for each field, thereby improving the accuracy of webpage filling.

[0080] The above embodiments describe how an agent can guide a user to input relevant information, obtaining such information through real-time interaction between the agent and the user. The following embodiments describe how to obtain relevant information through historical interactions between the agent and the user.

[0081] In some embodiments, memory information corresponding to the user is obtained, wherein the memory information includes stored key information corresponding to the user, which is extracted from the interaction information between the user and the intelligent agent; relevant information is determined from the memory information based on the semantic information of the description information corresponding to one or more items to be filled.

[0082] During interactions with users, the intelligent agent extracts key information to generate memory information. The process of generating and using this memory information is pre-authorized by the user. For example, user attributes, preferences, or user-uploaded documents and images can be used as key information to generate memory information. The types of key information extracted—that is, which key information to extract—can be configured by the user or determined automatically by the intelligent agent during the interaction.

[0083] Many fields for filling in the same information may exist on different web pages or forms. However, information cannot be shared between different web pages or forms, requiring users to manually fill them in each time. The method described in the above embodiment can generate memory information for the user, from which relevant information for filling in one or more fields can be determined. This eliminates the need for repeated interaction with the user to obtain relevant information, improving filling efficiency and enhancing the user experience.

[0084] In some embodiments, during the interaction between the user and the intelligent agent, key information corresponding to the user is extracted from the input information based on the semantic information of the user's input information, wherein the interaction information between the user and the intelligent agent includes the input information; and memory information is generated based on the key information corresponding to the user.

[0085] The intelligent agent can semantically understand the information input by the user during interaction and extract key information to generate memory information. This memory information is a distillation and summary of historical interaction information and differs from the actual historical interaction information between the user and the intelligent agent. The effective duration of the memory information can be configured, and it can be updated to ensure greater accuracy when filling out web pages later. As a web page filling assistant, the intelligent agent, through the methods described above, generates and stores memory information, providing more assistance to users during web page filling, eliminating the need for users to repeatedly provide the same information, and improving web page filling efficiency.

[0086] In some embodiments, historical interaction information corresponding to the user is obtained, wherein the interaction information between the user and the intelligent agent includes historical interaction information; relevant information is obtained from the historical interaction information based on the semantic information of the description information of one or more items to be filled in and the semantic information of the historical interaction information.

[0087] Historical interaction information can be information about interactions within a certain time range prior to the current time. For example, before triggering the fill-in function on a webpage, the user may have provided some information through interaction with the agent. The agent can then use semantic understanding to obtain the relevant information needed to fill in one or more fields.

[0088] The method described in the above embodiments automatically obtains relevant information from the historical interaction information between the intelligent agent and the user, reduces the number of times the user has to repeatedly provide relevant information, improves the efficiency of filling out web pages, and enhances the user experience.

[0089] The methods described above for obtaining relevant information for filling in one or more fields from the interaction information between the user and the intelligent agent can be used in combination.

[0090] In some embodiments, the following steps are taken: First, the user's stored information is obtained, wherein the stored information includes at least one of historical interaction information and memory information. The user's interaction information with the intelligent agent includes historical interaction information, and the memory information includes stored key information corresponding to the user, which is extracted from the user's interaction information with the intelligent agent. Second, based on the semantic information of the description information of one or more fields to be filled, it is determined whether some or all of the relevant information exists in the stored information. Third, in response to the existence of all relevant information in the stored information, the relevant information is obtained from the stored information. Fourth, in response to the absence of relevant information or the presence of some relevant information in the stored information, one or more fields to be filled or the fields corresponding to the missing relevant information are taken as target fields. Based on the semantic information of the description information of the target fields, guidance information is generated, wherein the guidance information is used to guide the user to provide the relevant information required to fill in the target field. Fifth, the guidance information is displayed. Sixth, in response to receiving information provided by the user based on the guidance information, it is determined whether the provided information is related to the target field, wherein the user's interaction information with the intelligent agent includes the provided information. Seventh, in response to the provided information being related to the target field, the provided information is determined to be the relevant information required to fill in the target field.

[0091] The system can first determine whether some or all of the relevant information is included from the agent's historical interaction information with the user and / or the agent's memory information. If all of the relevant information is included, the user does not need to provide further information. If the relevant information is not included or is only partially included, the system can further guide the user to provide additional relevant information.

[0092] The method described above can reduce the number of times users need to provide information, and can also accurately obtain all relevant information required to fill in one or more fields, thereby improving the efficiency and accuracy of webpage filling.

[0093] After obtaining the relevant information required to fill in one or more fields, it is necessary to generate the content to be filled in for each field based on the relevant information.

[0094] In some embodiments, a target modality corresponding to each of the one or more fields to be filled is determined based on the semantic information of the description information of one or more fields to be filled; and content to be filled corresponding to each field to be filled is generated based on the target modality and related information of each field to be filled, presented in the target modality.

[0095] Some fields to be filled are text input boxes, with a corresponding target modality of text; others are file upload boxes, with corresponding target modules such as images, documents, videos, or audio. The target modality for each field can be determined by combining the semantic information of the types and descriptions of one or more fields.

[0096] The modality of relevant information obtained from user-agent interactions may not be the same as the target modality. Therefore, functions or machine learning models corresponding to the target modality can be invoked to generate the content to be filled in for each field based on the relevant information. For example, if the relevant information is an image, the target modality could be converted into text. If a field requires a specific data type, format, and / or size, the generated content must also conform to that specific data type, format, and / or size. For example, a field might require integer data, an image size not exceeding 10MB, or a PDF document format.

[0097] For fields with options, the generated field content can be the corresponding selection information. When filling in the field later, you can use methods such as checking boxes to fill it in.

[0098] The content to be filled in for each generated field must meet the requirements of that field. For example, if a field has a word limit, the content can be expanded or abbreviated based on the relevant information to meet that word limit.

[0099] The method described in the above embodiments automatically generates the content to be filled in for each field presented in the target modality based on relevant information, which improves the efficiency of filling in web pages, makes it more convenient for users, and enhances the user experience.

[0100] After generating the content to be filled in for each field, the agent can automatically fill in the content and display the completed webpage. Users can then modify the content on the webpage, and the agent can store the modified content along with the corresponding field or generate memory information for later use on subsequent webpage entries.

[0101] In some embodiments, the content to be filled in for each field is displayed in the interactive area; in response to the user's confirmation operation, the content to be filled in for each field is entered into each field, and the filled webpage is displayed; in response to the user's modification operation on the content to be filled in for a certain field, the modified content to be filled in for each field is entered into each field, and the filled webpage is displayed.

[0102] As shown in Figure 2, the content to be filled in for each field can be displayed in the interaction area 202 for user confirmation or modification. Users can also interact with the agent regarding the content to be filled in for a specific field, instructing the agent on how to modify it. For example, in response to receiving a user's modification instruction for the content to be filled in for a specific field, the agent regenerates and displays the content to be filled in for that field based on the instruction.

[0103] The method described in the above embodiments provides users with a more flexible way to make modifications. Users can instruct the intelligent agent to modify unsatisfactory parts, which is more in line with the user's intentions and improves the user experience.

[0104] This disclosure also provides an interactive device that can be installed in a smart body, as described below with reference to Figure 3.

[0105] Figure 3 is a structural diagram of some embodiments of the interactive device of this disclosure. As shown in Figure 3, the interactive device of this embodiment includes: a determining module 310, an acquiring module 320, a generating module 330, and a display module 340.

[0106] The determination module 310 is configured to determine one or more fields to be filled in on the webpage in response to the webpage's fill-in function being triggered.

[0107] The acquisition module 320 is configured to acquire relevant information required to fill in one or more fields from the interaction information between the user and the interactive device, based on the semantic information of the description information corresponding to one or more fields to be filled.

[0108] The generation module 330 is configured to generate fill-in content for one or more fill-in items based on relevant information.

[0109] The display module 340 is configured to fill in the content to be filled into one or more fields and display the completed webpage.

[0110] In some embodiments, the generation module 330 is configured to determine the target modality corresponding to each of the one or more fill-in items based on the semantic information of the description information of the one or more fill-in items; and generate the fill-in content corresponding to each fill-in item presented in the target modality based on the target modality and related information of each fill-in item.

[0111] In some embodiments, the acquisition module 320 is configured to generate guidance information based on the semantic information of the description information of one or more fields to be filled in, wherein the guidance information is used to guide the user to provide relevant information; the display module 340 is configured to display the guidance information; the acquisition module 320 is further configured to, in response to receiving information provided by the user based on the guidance information, determine whether the provided information is related to one or more fields to be filled in, wherein the interaction information between the user and the agent includes the provided information; and in response to the provided information being related to one or more fields to be filled in, determine the provided information as relevant information.

[0112] In some embodiments, the acquisition module 320 is configured to identify the modality of the provided information; parse the provided information according to the modality of the provided information to determine the semantic information of the provided information; and determine whether the provided information is related to one or more items to be filled based on the semantic information of the provided information and the semantic information of the description information of one or more items to be filled.

[0113] In some embodiments, the acquisition module 320 is configured to determine the data type, filling instructions, and at least one example of the content to be filled in for each of the one or more fields to be filled in, based on the semantic information of the description information of each field to be filled in; and to generate guidance information based on the data type, filling instructions, and at least one example of the content to be filled in for each field to be filled in.

[0114] In some embodiments, the acquisition module 320 is configured to generate search terms based on the semantic information of the webpage description and the semantic information of the description of each field to be filled; perform a search based on the search terms; and determine the data type of the content to be filled, the filling instructions, and at least one example for each field to be filled based on the search results.

[0115] In some embodiments, the generation module 330 is configured to match the provided information with each item to be filled based on the semantic information of the provided information and the semantic information of the description information of each item to be filled in one or more items to be filled, and determine the information in the provided information that matches each item to be filled; and generate the content to be filled for each item to be filled based on the information in the provided information that matches each item to be filled.

[0116] In some embodiments, the acquisition module 320 is configured to acquire memory information corresponding to the user, wherein the memory information includes stored key information corresponding to the user, the key information being extracted from the interaction information between the user and the intelligent agent; and to determine relevant information from the memory information based on the semantic information of the description information corresponding to one or more items to be filled.

[0117] In some embodiments, the interaction device 30 further includes a memory module 350 configured to extract key information corresponding to the user from the input information based on the semantic information of the user's input information during the interaction between the user and the intelligent agent, wherein the interaction information between the user and the intelligent agent includes the input information; and to generate memory information based on the key information corresponding to the user.

[0118] In some embodiments, the acquisition module 320 is configured to acquire historical interaction information corresponding to the user, wherein the interaction information between the user and the intelligent agent includes historical interaction information; and to acquire relevant information from the historical interaction information based on the semantic information of the description information of one or more items to be filled in and the semantic information of the historical interaction information.

[0119] In some embodiments, the acquisition module 320 is configured to acquire stored information corresponding to the user, wherein the stored information includes at least one of historical interaction information and memory information, the user's interaction information with the intelligent agent includes historical interaction information, and the memory information includes stored key information corresponding to the user, the key information being extracted from the user's interaction information with the intelligent agent; determine whether there is some or all relevant information in the stored information based on the semantic information of the description information of one or more items to be filled; in response to the existence of all relevant information in the stored information, acquire relevant information from the stored information; in response to the absence of relevant information or the presence of some relevant information in the stored information, take one or more items to be filled or the items corresponding to the missing relevant information as target items, and generate guidance information based on the semantic information of the description information of the target items, wherein the guidance information is used to guide the user to provide relevant information required to fill in the target items; the display module 340 is configured to display the guidance information; the acquisition module 320 is further configured to, in response to receiving information provided by the user based on the guidance information, determine whether the provided information is related to the target item, wherein the user's interaction information with the intelligent agent includes the provided information; in response to the provided information being related to the target item, determine the provided information as relevant information required to fill in the target item.

[0120] In some embodiments, the determining module 310 is configured to: in response to a user opening a webpage, identify the content of the webpage; in response to identifying that the webpage includes fields to be filled in, determine that the filling function of the webpage is triggered and determine one or more fields to be filled in; or in response to a user opening a webpage and inputting a filling instruction, determine that the filling function of the webpage is triggered based on the semantic information of the filling instruction, identify the content of the webpage, and determine one or more fields to be filled in the webpage; or in response to a user opening a webpage, identify the content of the webpage; in response to identifying that the webpage includes fields to be filled in, display a filling control; in response to the user triggering the filling control, determine that the filling function of the webpage is triggered and determine one or more fields to be filled in.

[0121] In some embodiments, the determining module 310 is configured to obtain the document object model structure of the webpage, and determine one or more fields to be filled in and their description information in the webpage based on the document object model structure, wherein the description information includes at least one of name, tag information, and prompt information; and / or obtain an image of the webpage, identify the image of the webpage, and determine one or more fields to be filled in and their description information in the webpage, wherein obtaining the image of the webpage is pre-authorized by the user.

[0122] In some embodiments, the display module 340 is configured to display the content to be filled in for each field in the interactive area; in response to the user's confirmation operation, fill in the content to be filled in for each field and display the filled webpage; in response to the user's modification operation on the content to be filled in for a certain field, fill in the modified content to be filled in for each field and display the filled webpage.

[0123] Figure 4 shows a block diagram of an electronic device according to some embodiments of the present disclosure.

[0124] Memory 41 is used to store one or more computer-readable instructions. Memory 41 may include any combination of various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory, including but not limited to random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), and flash memory. Memory 41 may, for example, store operating systems, application programs, boot loaders, databases, and other programs, as well as various application programs and various data.

[0125] The processor 42 is configured to execute computer-readable instructions to implement the interactive method or the method described in any of the foregoing embodiments. Specific implementations of each step of the method can be found in the above embodiments; repeated details will not be elaborated upon here.

[0126] Processor 42 can be configured to perform the steps of the interactive method in any embodiment of this disclosure. Processor 42 can be embodied in various processing devices, such as a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The central processing unit (CPU) can be an x86 or ARM architecture, etc.

[0127] The processor 42 and the memory 41 can communicate with each other directly or indirectly. For example, the processor 42 and the memory 41 can communicate via a network. The network can include a wireless network, a wired network, and / or any combination of wireless and wired networks. The processor 42 and the memory 41 can also communicate with each other via a system bus, which is not limited in this disclosure.

[0128] It should be noted that the components of the electronic device 4 shown in Figure 4 are exemplary and not limiting. The electronic device 4 may have other components depending on the actual application requirements. The processor 42 can control other components in the electronic device 4 to perform the desired functions.

[0129] Electronic device 4 can be implemented by software, firmware and / or hardware, and can be integrated into a device with the relevant application installed.

[0130] Figure 5 shows a block diagram of an electronic device according to other embodiments of the present disclosure.

[0131] The electronic device 5 shown in Figure 5 can be a computer system with a dedicated hardware structure, capable of performing corresponding functions when relevant applications are installed.

[0132] Electronic devices include, but are not limited to, mobile terminals such as smartphones, laptops, personal digital assistants (PDAs), tablet computers (PCs), PMPs (portable multimedia players), in-vehicle terminals (such as in-vehicle navigation terminals), wearable devices, and fixed terminals such as digital televisions and desktop computers.

[0133] As shown in Figure 5, the Central Processing Unit (CPU) 51 performs various processes based on programs stored in the Read-Only Memory (ROM) 52 or programs loaded from the storage section 58 into the Random Access Memory (RAM) 53. The RAM 53 stores data required as needed when the CPU 51 performs various processes. The CPU is merely exemplary and can also be other types of processors, such as the various processors described above. The ROM 52, RAM 53, and storage section 58 can be various forms of computer-readable storage media. It should be noted that although the ROM 52, RAM 53, and storage section 58 are shown separately in Figure 5, one or more of them can be combined or located in the same or different memories or storage modules.

[0134] CPU 51, ROM 52 and RAM 53 are interconnected via bus 54. Input / output interface 55 is also connected to bus 54.

[0135] The following components are connected to the input / output interface 55: input section 56, such as a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output section 57, including displays such as cathode ray tube (CRT), liquid crystal display (LCD), speakers, vibrators, etc.; storage section 58, including hard disks, magnetic tapes, etc.; and communication section 59, including network interface cards such as LAN cards, modems, etc. The communication section 59 allows communication processing to be performed via a network such as the Internet. It is readily understood that although parts of the electronic device 5 shown in Figure 5 communicate via bus 54, they can also communicate via a network or other means, wherein the network can include wireless networks, wired networks, and / or any combination of wireless and wired networks.

[0136] As needed, drive 510 is also connected to input / output interface 55. Removable media 511, such as disks, optical discs, magneto-optical discs, semiconductor memories, etc., are installed on drive 510 as needed, so that computer programs read from them can be installed into storage section 58 as needed.

[0137] When the above series of processes are implemented through software, the program constituting the software can be installed from a network such as the Internet or a storage medium such as a removable medium 511.

[0138] According to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this disclosure include a computer program product that, when run on a computer, causes the computer to perform the methods described in any of the foregoing embodiments. The computer program product includes computer instructions carried on a computer-readable medium, containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer instructions can be downloaded and installed from a network via communication section 59, or installed from storage section 58, or installed from ROM 52. When the computer program is executed by CPU 51, the methods of embodiments of this disclosure are performed.

[0139] It should be noted that, in the context of this disclosure, a computer-readable medium can be a tangible medium that may contain or store programs for use by or in conjunction with an instruction execution system, apparatus, or device.

[0140] A computer-readable medium may be a computer-readable storage medium, a computer-readable signal medium, or any combination thereof.

[0141] Computer-readable storage media include, but are not limited to, systems, apparatuses, or devices that are electrical, magnetic, optical, electromagnetic, infrared, or semiconductor, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Computer instructions are stored on the computer-readable storage medium that, when executed by a processor, implement the methods described in any of the foregoing embodiments.

[0142] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0143] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0144] In some embodiments, a computer program is also provided, comprising: instructions that, when executed by a processor, cause the processor to perform the methods described in any of the foregoing embodiments. For example, the instructions may be embodied in computer program code.

[0145] In embodiments of this disclosure, computer program code for performing the operations of this disclosure can be written in one or more programming languages ​​or a combination thereof. These programming languages ​​include, but are not limited to, object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network (including a local area network (LAN) or a wide area network (WAN)), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0146] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0147] The functions described above can be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that can be used include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0148] According to some embodiments of this disclosure, an interaction method is provided, applied to an intelligent agent, comprising: in response to the triggering of a webpage's fill-in function, determining one or more fields to be filled in the webpage; obtaining relevant information required for filling in one or more fields from the interaction information between the user and the intelligent agent based on the semantic information of the description information corresponding to the one or more fields to be filled in; generating content to be filled in corresponding to the one or more fields to be filled in based on the relevant information; filling in the content to be filled in the one or more fields to be filled in, and displaying the filled webpage.

[0149] In some embodiments, generating the content to be filled for one or more fields to be filled based on relevant information includes: determining the target modality corresponding to each field to be filled based on the semantic information of the description information of one or more fields to be filled; and generating the content to be filled for each field to be filled presented in the target modality based on the target modality corresponding to each field to be filled and relevant information.

[0150] In some embodiments, obtaining relevant information required to fill in one or more fields from the user-agent interaction information based on the semantic information of the description information corresponding to one or more fields to be filled includes: generating guidance information based on the semantic information of the description information of one or more fields to be filled, wherein the guidance information is used to guide the user to provide relevant information; displaying the guidance information; determining whether the provided information is related to one or more fields to be filled in in response to receiving information provided by the user based on the guidance information, wherein the user-agent interaction information includes the provided information; and determining the provided information as relevant information in response to the provided information being related to one or more fields to be filled.

[0151] In some embodiments, determining whether the provided information is related to one or more fields to be filled includes: identifying the modality of the provided information; parsing the provided information according to the modality of the provided information to determine the semantic information of the provided information; and determining whether the provided information is related to one or more fields to be filled based on the semantic information of the provided information and the semantic information of the description information of one or more fields to be filled.

[0152] In some embodiments, generating guidance information based on the semantic information of the description information of one or more fields to be filled includes: determining the data type, filling instructions, and at least one example of the content to be filled for each field to be filled based on the semantic information of the description information of each field to be filled; and generating guidance information based on the data type, filling instructions, and at least one example of the content to be filled for each field to be filled.

[0153] In some embodiments, determining the data type, filling instructions, and at least one of the examples of the content to be filled for each field based on the semantic information of the description information of each field to be filled in one or more fields includes: generating search terms based on the semantic information of the webpage description information and the semantic information of the description information of each field to be filled; performing a search based on the search terms; and determining the data type, filling instructions, and at least one of the examples of the content to be filled for each field to be filled based on the search results.

[0154] In some embodiments, generating the content to be filled for one or more fields based on relevant information includes: matching the provided information with each field based on the semantic information of the provided information and the semantic information of the description information of each field in one or more fields, and determining the information in the provided information that matches each field; and generating the content to be filled for each field based on the information in the provided information that matches each field.

[0155] In some embodiments, obtaining relevant information required to fill in one or more fields from the user-agent interaction information based on the description information corresponding to one or more fields to be filled includes: obtaining memory information corresponding to the user, wherein the memory information includes stored key information corresponding to the user, the key information being extracted from the user-agent interaction information; and determining relevant information from the memory information based on the semantic information of the description information corresponding to one or more fields to be filled.

[0156] In some embodiments, the interaction method further includes: during the interaction between the user and the intelligent agent, extracting key information corresponding to the user from the input information based on the semantic information of the user's input information, wherein the interaction information between the user and the intelligent agent includes the input information; and generating memory information based on the key information corresponding to the user.

[0157] In some embodiments, obtaining relevant information required to fill in one or more fields from the user-agent interaction information based on the description information corresponding to one or more fields to be filled includes: obtaining the user's historical interaction information, wherein the user-agent interaction information includes historical interaction information; and obtaining relevant information from the historical interaction information based on the semantic information of the description information of one or more fields to be filled and the semantic information of the historical interaction information.

[0158] In some embodiments, obtaining relevant information required to fill in one or more fields from the user-agent interaction information based on the description information corresponding to one or more fields to be filled includes: obtaining storage information corresponding to the user, wherein the storage information includes at least one of historical interaction information and memory information, the user-agent interaction information includes historical interaction information, and the memory information includes stored key information corresponding to the user, the key information being extracted from the user-agent interaction information; determining whether some or all of the relevant information exists in the storage information based on the semantic information of the description information of one or more fields to be filled; in response to the existence of all relevant information in the storage information, obtaining the relevant information from the storage information; in response to the absence of relevant information or the presence of some relevant information in the storage information, taking one or more fields to be filled or the fields corresponding to the absent relevant information as target fields, generating guidance information based on the semantic information of the description information of the target fields, wherein the guidance information is used to guide the user to provide the relevant information required to fill in the target field; displaying the guidance information; in response to receiving information provided by the user based on the guidance information, determining whether the provided information is related to the target field, wherein the user-agent interaction information includes the provided information; and in response to the provided information being related to the target field, determining the provided information as the relevant information required to fill in the target field.

[0159] In some embodiments, in response to the webpage's fill-in function being triggered, determining one or more fields to be filled in the webpage includes: in response to a user opening the webpage, identifying the content of the webpage; in response to identifying that the webpage includes fields to be filled in, determining that the webpage's fill-in function is triggered, and determining one or more fields to be filled in; or in response to a user opening the webpage and entering a fill-in instruction, determining that the webpage's fill-in function is triggered based on the semantic information of the fill-in instruction, identifying the content of the webpage, and determining one or more fields to be filled in the webpage; or in response to a user opening the webpage, identifying the content of the webpage; in response to identifying that the webpage includes fields to be filled in, displaying a fill-in control; in response to the user triggering the fill-in control, determining that the webpage's fill-in function is triggered, and determining one or more fields to be filled in.

[0160] In some embodiments, determining one or more fields to be filled in a webpage includes: obtaining the document object model structure of the webpage, determining one or more fields to be filled in the webpage and their description information based on the document object model structure, wherein the description information includes at least one of name, tag information, and prompt information; and / or obtaining an image of the webpage, recognizing the image of the webpage, and determining one or more fields to be filled in the webpage and their description information, wherein obtaining the image of the webpage is pre-authorized by the user.

[0161] In some embodiments, filling in the content to be filled into one or more fields and displaying the filled webpage includes: displaying the content to be filled for each field in an interactive area; in response to the user's confirmation operation, filling in the content to be filled for each field and displaying the filled webpage; in response to the user's modification operation on the content to be filled for a certain field, filling in the modified content to be filled for each field and displaying the filled webpage.

[0162] According to some other embodiments of this disclosure, an interactive device is provided, comprising: a determining module configured to determine one or more fields to be filled in the webpage in response to the triggering of a fill-in function on the webpage; an obtaining module configured to obtain relevant information required for filling in one or more fields from interaction information between the user and the interactive device based on semantic information of descriptive information corresponding to the one or more fields to be filled; a generating module configured to generate content to be filled in corresponding to the one or more fields to be filled based on the relevant information; and a display module configured to fill in the content to be filled in the one or more fields to be filled and display the filled webpage.

[0163] According to further embodiments of the present disclosure, an electronic device is provided, comprising: a memory; and a processor coupled to the memory, the processor being configured to execute an interaction method as described in any embodiment of the present disclosure based on instructions stored in the memory.

[0164] According to further embodiments of the present disclosure, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the interactive methods of any embodiment of the present disclosure.

[0165] According to some other embodiments of the present disclosure, a computer program is provided, comprising: instructions that, when executed by a processor, cause the processor to perform an interactive method according to any embodiment of the present disclosure.

[0166] According to further embodiments of the present disclosure, a computer program product is provided, including instructions that, when executed by a processor, implement the interactive methods of any embodiment of the present disclosure.

[0167] While specific embodiments of this disclosure have been described in detail by way of example, those skilled in the art should understand that the examples are for illustrative purposes only and not intended to limit the scope of this disclosure. Those skilled in the art should understand that modifications can be made to the above embodiments without departing from the scope and spirit of this disclosure. The scope of this disclosure is defined by the appended claims.

Claims

1. An interaction method applied to an intelligent agent, comprising: In response to the webpage's fill-in function being triggered, one or more fields to be filled in on the webpage are identified; Based on the semantic information of the description information corresponding to the one or more fields to be filled, relevant information required to fill in the one or more fields to be filled is obtained from the interaction information between the user and the intelligent agent; Based on the relevant information, generate the content to be filled for the one or more fields to be filled; Enter the information to be filled into one or more fields, and then display the completed webpage.

2. The interaction method according to claim 1, wherein, The step of generating the content to be filled for the one or more fields to be filled based on the relevant information includes: Based on the semantic information of the description information of the one or more items to be filled, determine the target modality corresponding to each item in the one or more items to be filled; Based on the target modality corresponding to each field to be filled and the relevant information, generate the content to be filled for each field to be filled, presented in the target modality.

3. The interaction method according to claim 1, wherein, The step of obtaining relevant information for filling in the one or more fields from the interaction information between the user and the intelligent agent based on the semantic information of the description information corresponding to the one or more fields to be filled includes: Based on the semantic information of the description information of the one or more fields to be filled in, guidance information is generated, wherein the guidance information is used to guide the user to provide the relevant information; Display the aforementioned guidance information; In response to receiving information provided by the user based on the guidance information, it is determined whether the provided information is related to the one or more fields to be filled in, wherein the interaction information between the user and the intelligent agent includes the provided information; In response to the fact that the provided information is related to the one or more fields to be filled in, the provided information is identified as the relevant information.

4. The interaction method according to claim 3, wherein, Determining whether the provided information is related to one or more fields to be filled includes: Identify the modality of the provided information; Based on the modality of the provided information, the provided information is parsed to determine the semantic information of the provided information; Based on the semantic information of the provided information and the semantic information of the description information of the one or more items to be filled, determine whether the provided information is related to the one or more items to be filled.

5. The interaction method according to claim 3 or 4, wherein, The step of generating guidance information based on the semantic information of the description information of the one or more items to be filled includes: Based on the semantic information of the description information of each of the one or more fields to be filled, determine at least one of the data type, filling instructions, and examples of the content to be filled for each field to be filled; The guidance information is generated based on the data type of the content to be filled in for each field, the filling instructions, and at least one example.

6. The interaction method according to claim 5, wherein, The step of determining, based on the semantic information of the description information of each of the one or more fields to be filled, the data type of the content to be filled, the filling instructions, and at least one of the examples corresponding to each field to be filled includes: Based on the semantic information of the webpage description and the semantic information of the description of each field to be filled in, search terms are generated; Search according to the search terms; Based on the search results, determine at least one of the following for each field to be filled: the data type, filling instructions, and examples.

7. The interaction method according to any one of claims 4-6, wherein, The step of generating the content to be filled for the one or more fields to be filled based on the relevant information includes: Based on the semantic information of the provided information and the semantic information of the description information of each of the one or more fields to be filled, the provided information is matched with each field to be filled to determine the information in the provided information that matches each field to be filled. Based on the information provided that matches each item to be filled, generate the content to be filled for each item.

8. The interaction method according to any one of claims 1-7, wherein, The step of obtaining the relevant information required to fill in the one or more fields from the interaction information between the user and the intelligent agent based on the description information corresponding to the one or more fields to be filled includes: Obtain memory information corresponding to the user, wherein the memory information includes stored key information corresponding to the user, and the key information is extracted from the interaction information between the user and the intelligent agent; Based on the semantic information of the description information corresponding to the one or more items to be filled, the relevant information is determined from the memory information.

9. The interaction method according to claim 8, further comprising: During the interaction between the user and the intelligent agent, key information corresponding to the user is extracted from the input information based on the semantic information of the user's input information, wherein the interaction information between the user and the intelligent agent includes the input information; The memory information is generated based on the key information corresponding to the user.

10. The interaction method according to any one of claims 1-9, wherein, The step of obtaining the relevant information required to fill in the one or more fields from the interaction information between the user and the intelligent agent based on the description information corresponding to the one or more fields to be filled includes: Obtain the historical interaction information corresponding to the user, wherein the interaction information between the user and the intelligent agent includes the historical interaction information; Based on the semantic information of the description information of the one or more items to be filled in and the semantic information of the historical interaction information, the relevant information is obtained from the historical interaction information.

11. The interaction method according to claim 1 or 2, wherein, The step of obtaining the relevant information required to fill in the one or more fields from the interaction information between the user and the intelligent agent based on the description information corresponding to the one or more fields to be filled includes: Obtain the storage information corresponding to the user, wherein the storage information includes at least one of historical interaction information and memory information, the interaction information between the user and the intelligent agent includes the historical interaction information, and the memory information includes stored key information corresponding to the user, wherein the key information is extracted from the interaction information between the user and the intelligent agent; Based on the semantic information of the description information of one or more items to be filled, determine whether some or all of the relevant information exists in the stored information; In response to the presence of all the relevant information in the stored information, the relevant information is retrieved from the stored information; In response to the absence of the relevant information or the presence of a portion of the relevant information in the stored information, the one or more items to be filled or the items corresponding to the missing portion of the relevant information are taken as target items. Based on the semantic information of the description information of the target items, guidance information is generated, wherein the guidance information is used to guide the user to provide the relevant information required to fill in the target items. Display the aforementioned guidance information; In response to receiving information provided by the user based on the guidance information, it is determined whether the provided information is related to the target item, wherein the interaction information between the user and the agent includes the provided information; In response to the fact that the provided information is relevant to the target item, the provided information is determined to be the relevant information required to fill in the target item.

12. The interaction method according to any one of claims 1-11, wherein, When the fill-in function on the webpage is triggered, one or more fields to be filled in on the webpage are determined, including: In response to the user opening the webpage, the content of the webpage is identified; in response to the identification that the webpage includes fields to be filled in, it is determined that the field filling function of the webpage has been triggered, and the one or more fields to be filled in are identified; or In response to the user opening the webpage and inputting a fill-in command, based on the semantic information of the fill-in command, it is determined that the fill-in function of the webpage has been triggered, the content of the webpage is identified, and the one or more items to be filled in the webpage are determined; or In response to the user opening the webpage, the content of the webpage is identified; in response to the identification that the webpage includes fields to be filled in, a field filling control is displayed; in response to the user triggering the field filling control, it is determined that the field filling function of the webpage has been triggered, and the one or more fields to be filled in are identified.

13. The interaction method according to claim 12, wherein, The process of determining the one or more fields to be filled in on the webpage includes: Obtain the document object model structure of the webpage; based on the document object model structure, determine one or more fields to be filled in the webpage and their description information, wherein the description information includes at least one of name, tag information, and prompt information; and / or The image of the webpage is acquired, the image of the webpage is identified, and one or more fields to be filled in and their descriptions are determined in the webpage. The acquisition of the image of the webpage is authorized by the user in advance.

14. The interaction method according to any one of claims 1-13, wherein, The step of filling in the content to be filled into one or more fields and displaying the filled webpage includes: The content to be filled in for each item to be filled in is displayed in the interactive area; In response to the user's confirmation, the content to be filled in for each field is entered into each field, and the completed webpage is displayed. In response to the user's modification operation on the content to be filled in for a certain field, the modified content to be filled in for each field is entered into each field, and the webpage after completion is displayed.

15. An interactive device, comprising: The determination module is configured to determine one or more fields to be filled in the webpage in response to the webpage's fill-in function being triggered. The acquisition module is configured to acquire relevant information required to fill in the one or more fields based on the semantic information of the description information corresponding to the one or more fields to be filled in, from the interaction information between the user and the interactive device. The generation module is configured to generate the content to be filled in corresponding to the one or more fields to be filled in, based on the relevant information. The display module is configured to fill in the content to be filled into one or more fields and display the filled webpage.

16. An electronic device comprising: Memory; as well as A processor coupled to the memory, the processor being configured to execute the interaction method as described in any one of claims 1 to 14 based on instructions stored in the memory.

17. A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the interactive method of any one of claims 1 to 14.

18. A computer program product comprising: Instructions, wherein when executed by a processor, the instructions implement the interaction method of any one of claims 1 to 14.

19. A computer program comprising: Instructions, wherein when executed by a processor, the instructions implement the interaction method of any one of claims 1 to 14.