Interaction method and device based on multi-model cooperation, agent and electronic device
By employing a multi-model collaborative interaction method, the first and second major models are used to understand the intent and semantics of the demand information, generating response information that matches the user's needs. This solves the problem of users having difficulty finding matching resources and improves information matching accuracy and user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING BAIDU NETCOM SCI & TECH CO LTD
- Filing Date
- 2025-07-14
- Publication Date
- 2026-06-26
AI Technical Summary
Users struggle to quickly find internet resources that match their needs, and existing search technologies are unable to accurately understand users' interests and needs, resulting in a significant discrepancy between information content and user requirements, making it difficult to meet user needs.
A multi-model collaborative interaction method is adopted. The first model is used to understand the intent of demand information and resource-related features and generate demand intent description text. The second model is used to understand the semantics of demand intent description text and search results and generate response information to improve the matching degree.
By collaborating with multiple models, the matching degree between response information and users' actual needs is improved, the frequency and complexity of interactions are reduced, and the user experience is enhanced.
Smart Images

Figure CN120805926B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to the fields of intelligent response, intelligent search, resource recommendation, and intelligent customer service. Background Technology
[0002] With the rapid development of artificial intelligence technology, users can conveniently browse news, videos, and other resources through smartphones and other terminal devices. Alternatively, users can also search for information by entering search terms on their terminal devices to meet their needs for information acquisition, such as travel planning and knowledge learning. Summary of the Invention
[0003] This disclosure provides an interaction method, apparatus, intelligent agent, and storage medium based on multi-model collaboration.
[0004] According to one aspect of this disclosure, an interaction method based on multi-model collaboration is provided, comprising: receiving demand information input by a target object; using a first major model to perform intent understanding on the demand information and resource-related features to obtain demand intent description text, wherein the resource-related features are related to resources browsed by the target object, and the demand intent description text represents the target object's degree of demand for resource content in natural language form; using a second major model to perform semantic understanding on the demand intent description text and search results determined based on the demand information to obtain response information; and pushing the response information to the target object.
[0005] According to another aspect of this disclosure, an interactive device based on multi-model collaboration is provided, comprising: a receiving module for receiving demand information input by a target object; a first obtaining module for performing intent understanding on the demand information and resource-related features using a first major model to obtain demand intent description text, wherein the resource-related features are related to resources browsed by the target object, and the demand intent description text represents the target object's degree of demand for resource content in natural language form; a second obtaining module for performing semantic understanding on the demand intent description text and search results determined based on the demand information using a second major model to obtain response information; and a push module for pushing the response information to the target object.
[0006] According to another aspect of this disclosure, an artificial intelligence agent is provided, comprising: an input module for receiving input information; a processing module for determining a target task based on the input information received by the input module, determining a large model based on the target task, and obtaining output information by calling the large model to execute the method provided according to the embodiments of this disclosure; and an output module for outputting the output information obtained by the processing module.
[0007] According to another aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform a method provided according to an embodiment of this disclosure.
[0008] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform a method provided according to an embodiment of this disclosure.
[0009] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method provided according to embodiments of this disclosure.
[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0011] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:
[0012] Figure 1 The illustration schematically shows an exemplary system architecture for applying multi-model collaboration-based interaction methods and apparatus according to embodiments of the present disclosure;
[0013] Figure 2 A flowchart illustrating a multi-model collaboration-based interaction method according to an embodiment of the present disclosure is shown schematically.
[0014] Figure 3 The illustration shows a schematic diagram of the principle of the multi-model collaboration-based interaction method provided according to an embodiment of the present disclosure;
[0015] Figure 4 The illustration shows a schematic diagram of the principle of an interaction method based on multi-model collaboration according to another embodiment of the present disclosure;
[0016] Figure 5 A block diagram of a multi-model collaboration-based interactive device according to an embodiment of the present disclosure is shown schematically.
[0017] Figure 6 A schematic diagram illustrating the structure of an intelligent agent of artificial intelligence according to embodiments of the present disclosure; and
[0018] Figure 7A schematic block diagram of an example electronic device is shown that can be used to implement the multi-model collaboration-based interaction method of embodiments of the present disclosure. Detailed Implementation
[0019] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.
[0020] In the technical solution disclosed herein, the acquisition, storage, and application of user personal information comply with the provisions of relevant laws and regulations, necessary confidentiality measures have been taken, and there is no violation of public order and good morals.
[0021] The inventors discovered that with the rapid development of Internet technology, the massive amount of resource data generated on the Internet makes it difficult for users to quickly find the resource content they need. This can easily lead to the searched information being irrelevant to the user's needs, or the searched information being too different from the user's actual interests and needs, making it difficult to meet the user's needs.
[0022] This disclosure provides an interaction method, device, intelligent agent, and storage medium based on multi-model collaboration. The interaction method based on multi-model collaboration includes: receiving demand information input by a target object; using a first major model to perform intent understanding on the demand information and resource-related features to obtain a demand intent description text, wherein the resource-related features are related to resources browsed by the target object, and the demand intent description text represents the target object's degree of demand for resource content in natural language form; using a second major model to perform semantic understanding on the demand intent description text and search results determined based on the demand information to obtain response information; and pushing the response information to the target object.
[0023] According to embodiments of this disclosure, a demand description text is obtained by using a first model to understand the intent of demand information and resource-related features. This allows the target object's interest in the resource content they have browsed to be captured by describing the target object's demand for the resource content based on the demand description text using natural language. A second model can then be used to semantically understand the demand intent description text and search results. This allows the second model to more accurately understand the target object's demand for the resource content and changes in interest through the natural language attributes of the demand description text, such as its grammatical structure and demand level descriptions. The matching degree between this understanding and the information content of the search results retrieved based on the demand information can then be improved. This allows the generated response information to match the target object's demand for the resource content, thus improving the accuracy of response information delivery, reducing interaction frequency and complexity, and enhancing the user experience.
[0024] Figure 1 The illustration schematically shows an exemplary system architecture for applying multi-model collaborative interaction methods and apparatus according to embodiments of the present disclosure.
[0025] It is important to note that Figure 1 The examples shown are merely examples of system architectures that can be applied to embodiments of this disclosure, intended to help those skilled in the art understand the technical content of this disclosure. However, they do not imply that embodiments of this disclosure cannot be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture for applying the multi-model collaboration-based interaction method and apparatus may include a terminal device. However, the terminal device may implement the multi-model collaboration-based interaction method and apparatus provided by embodiments of this disclosure without interacting with a server.
[0026] like Figure 1 As shown, the system architecture 100 according to this embodiment may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the terminal devices 101, 102, and 103 and the server 105. The network 104 may include various connection types, such as wired and / or wireless communication links, etc.
[0027] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients, and / or social platform software, etc. (for example only).
[0028] Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers.
[0029] Server 105 can be a server that provides various services, such as a backend management server that supports the content browsed by users using terminal devices 101, 102, and 103 (for example only). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.
[0030] Server 105 can be a cloud server, also known as a cloud computing server or cloud host. It is a host product in the cloud computing service system, which solves the shortcomings of traditional physical hosts and VPS services ("Virtual Private Server", or simply "VPS"), such as high management difficulty and weak business scalability. Server 105 can also be a server for a distributed system or a server combined with blockchain.
[0031] It should be noted that the multi-model collaboration-based interaction method provided in this disclosure embodiment can generally be executed by terminal devices 101, 102, or 103. Correspondingly, the multi-model collaboration-based interaction device provided in this disclosure embodiment can also be disposed in terminal devices 101, 102, or 103.
[0032] Alternatively, the multi-model collaboration-based interaction method provided in this disclosure can generally be executed by server 105. Correspondingly, the multi-model collaboration-based interaction device provided in this disclosure can generally be located in server 105. The multi-model collaboration-based interaction method provided in this disclosure can also be executed by a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105. Correspondingly, the multi-model collaboration-based interaction device provided in this disclosure can also be located in a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105.
[0033] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0034] Figure 2 A flowchart illustrating an interaction method based on multi-model collaboration according to an embodiment of the present disclosure is shown schematically.
[0035] like Figure 2 As shown, the interaction method based on multi-model collaboration includes operations S210~S240.
[0036] In operation S210, the requirement information input by the target object is received.
[0037] In operation S220, the first major model is used to understand the intent of demand information and resource-related features, and the resulting demand intent description text is obtained.
[0038] In operation S230, the second major model is used to perform semantic understanding on the text describing the demand intent and the search results determined based on the demand information to obtain the response information.
[0039] In operation S240, a response message is pushed to the target object.
[0040] According to embodiments of this disclosure, the requirement information input by the target object may include data in any modality, such as text, voice, or images. The target object can input the requirement information through a terminal device such as a smartphone.
[0041] According to embodiments of this disclosure, the first and second large-scale models can be generative language models built based on deep learning algorithms. These large-scale language models have a large number of parameters, such as hundreds of millions to trillions, to achieve semantic understanding of natural language representations and perform generative data processing tasks based on this large-scale parameter set, generating structured or unstructured data such as text, tables, and charts. The first and second large-scale models are large-scale models suitable for performing data processing tasks under different task requirements, and they can have different models adapted to different scenarios.
[0042] In some embodiments, resource-related features are related to resources viewed by the target object. Resources viewed by the target object may include any browsable resource such as video resources, news resources, and product information resources. Resource-related features may also be related to any resource content, such as the resource's main content, video screenshots, comments, or main text.
[0043] According to embodiments of this disclosure, the demand intent description text represents the degree of demand of the target object for the resource content in natural language form. The degree of demand may include attention, preference, etc. The demand intent description text can describe the degree of demand of the target object for the resource content based on grammatical structures such as adjectives, words indicating time changes, and phrases indicating probability.
[0044] According to embodiments of this disclosure, the search results can be any type of information such as page text, images, and videos obtained based on demand information.
[0045] In some embodiments, the demand intent description text can also represent the changes in the target object's demand for resource content, so that the second model can more accurately understand the changes in the target object's demand intent to process the search results and generate response information that matches the changes in demand intent, thereby improving the matching degree between the response information and the actual needs of the target object and improving the accuracy and timeliness of the response.
[0046] According to embodiments of this disclosure, the demand intent description text, in natural language form, represents the target object's degree of demand for resource content. This allows the second model to perform semantic understanding based on the target object's degree of demand for resource content expressed in the natural language form of the demand intent description text, and to learn the target object's preference or level of interest in different resource content. Thus, the second model can semantically understand the demand intent description text and the search results determined based on demand information to generate response information that matches the target object's degree of demand for resource content.
[0047] In one example, the demand information could be "Where to travel this weekend?". The demand intent description text would be: "The user is interested in cities A and B. For example, they might have a need to experience the urban rail transit in city A, and a need to take photos at attraction B in city B. However, they are very interested in the cruise ship program in city B over the past month." It should be understood that city A, city B, attraction B, and the cruise ship program in city B can be resources that the target user has browsed. Phrases like "might have a need to experience" and "very interested" in the demand intent description text can indicate the degree of demand.
[0048] The second model, through semantic understanding of the demand intent description text and the search results determined based on demand information, can generate a response such as: "Based on your interests and preferences, you can refer to the following itinerary planning: a boat tour on the famous city river in City B, with a ticket price of 110 yuan and a duration of 1 hour. You can book your boat ticket through the following link: XXXY.COM. Additionally, you can visit attraction B after the boat tour by booking tickets for attraction B through the following link: XyyyyY.COM." The demand intent description text can express the target audience's degree of demand for resource content and the changes in that degree of demand in natural language. This avoids the difficulty of the second model in understanding demand information based on discrete resource content, enabling a clearer and more accurate understanding of the target audience's degree of demand for resource content based on the demand intent description text. It then processes information related to various resource contents in the search results according to the degree of demand for resource content, generating response information that matches the target audience's actual intent. Thus, through the collaborative processing of resource-related features and demand information by the first and second models, a more accurate response information representing the user's actual demand intent can be generated.
[0049] In one embodiment, different types of role attribute instructions are used to instruct the first and second main models to perform data processing tasks based on the role attributes corresponding to their respective role attribute instructions, depending on the task requirements. The role attributes represented by the role attribute instructions may include demand preference analysis expert attributes and response content planning expert attributes. Therefore, by inputting role attribute instructions representing demand preference analysis expert attributes and response content planning expert attributes into the first and second main models respectively, the models can be instructed to perform tasks to generate demand intent description text and response information.
[0050] In some embodiments, using the first major model to perform intent understanding on demand information and resource-related features to obtain demand intent description text may include: based on intent understanding prompts, using the first major model to perform intent understanding tasks according to demand information and resource-related features.
[0051] According to embodiments of this disclosure, intent understanding prompts are used to prompt a first-level model to perform an intent understanding task based on features related to resource content to obtain a demand intent description text. Features related to resource content can represent the target object's interaction behavior with the resource content, such as browsing duration, browsing frequency, liking behavior, etc. The intent understanding prompts can be represented based on data in any format, such as symbols, text, or script code. These prompts can serve as input data for the first-level model, prompting it to understand the target object's demand for different resource content by analyzing the target object's interaction behavior related to resource content and the demand intent expressed by the input demand information. This allows the demand intent description text to more accurately describe the target object's demand for different resource content, enabling the second-level model to output more accurate response information by processing the search results and the demand intent description text.
[0052] In some embodiments, features related to resource content may include at least one of browsing duration features, browsing frequency features, liking behavior features, and commenting behavior features. Features related to resource content can be obtained by extracting or encoding features from interactive behaviors such as browsing duration, browsing frequency, liking behavior, and commenting behavior of the target object.
[0053] In some embodiments, performing an intent understanding task based on intent understanding prompts and utilizing a first major model to perform an intent understanding task based on demand information and resource-related features may include: performing an intent understanding task based on intent understanding prompts and utilizing a first major model to perform an intent understanding task based on related resource features among demand information and resource-related features.
[0054] Specifically, the associated resource features and the demand information meet a preset semantic similarity condition. Associated resource features can be related to associated resources viewed by the target object, and associated resources can include resources semantically related to the topic represented by the demand information. For example, if the demand information is "Where to go this weekend?", the associated resource features could be related to travel-related videos, news, and other resources viewed by the target object.
[0055] According to embodiments of this disclosure, by controlling the first model to understand the intent based on the intent understanding prompt information, the resource-related features in the first model that are not related to the intent represented by the demand information can be removed, and the noise interference received by the first model can be reduced. This allows the demand intent description text to describe the degree of demand or attention of the demand information for different resource content more accurately and with fine granularity, thereby improving the matching degree between the response information output by the second model and the actual intent of the target object.
[0056] In some embodiments, the first major model can leverage the prompting function of intent-understanding prompts to process demand information, the target object's historical search information, and the content features of resources found based on historical search information that satisfy semantic similarity conditions with the demand information. This allows for analysis of the multiple resource contents the target object needs to focus on by inputting the current demand information, and the corresponding degree of demand for those resource contents. This improves the accuracy of the demand intent description text in describing the target object's degree of demand for different resource contents. Thus, it achieves a fine-grained description of the degree of demand for resource content through intent description text, thereby improving the accuracy of subsequent response information.
[0057] In some embodiments, the search results may include basic search results retrieved based on basic demand information and extended search results retrieved based on extended demand information. The basic demand information and the basic intent represented by the demand information satisfy the semantic similarity condition. The extended demand information related to the extended intent of the target object and the basic demand information satisfy the semantic difference condition. The second major model can generate response information by processing demand information, basic search results, and extended search results to meet the potential needs of the target object during the input of demand information, thereby improving the accuracy and quality of the response.
[0058] In some embodiments, the search results are determined based on the following operations: updating the demand information based on resource-related features and object environment attributes related to the demand information to obtain target demand information; and performing a search based on the target demand information to obtain detection results.
[0059] According to embodiments of this disclosure, the object's environmental attributes are related to the environment surrounding the target object during the input of demand information. For example, they may be related to the weather, time, and geographical location around the target object. By updating the demand information based on resource-related features and object environmental attributes, the obtained target demand information can be related to the target object's demand intent under specific environmental conditions. Furthermore, the target demand information can more accurately retrieve resources related to topics with resource-related features, thus accurately representing the target object's actual information retrieval intent at the current time. Therefore, search results can be retrieved using the target demand information, allowing the second model to process resource-related information in the search results more completely and comprehensively according to the target object's demand for resource content, thereby improving the matching degree between the response information and the target object's actual needs.
[0060] In some embodiments, the object environment attributes include at least one of the following attributes related to the process of inputting requirement information for the target object: time attribute, location attribute, and weather condition attribute.
[0061] In some embodiments, the time attribute may include time period data, such as a field or identifier representing a time period like "9:00 AM to 11:00 AM". Alternatively, the time attribute may also include data representing time periods for specific scenarios such as holidays or lunchtime.
[0062] In some embodiments, location attributes may include data representing the geographic location of the target object. Weather attributes may represent the weather conditions near the target object during the input of requirement information, and may include, for example, information related to weather conditions such as rainfall, typhoon warnings, snowfall, temperature, and humidity.
[0063] By updating demand information based on resource-related features and object-environment attributes related to the demand information, the target demand information can more accurately represent the target object's current search intent and demand intent, and adapt to specific conditions such as the target object's current weather, geographical location, and time requirements. For example, it can avoid determining the target demand information as a search for tourist attractions with a travel duration of more than 4 days when the target object is on a weekday and there is no long holiday in sight. This allows the second model to process the search results retrieved based on the target demand information and the demand intent description text to generate response information that accurately represents the user's actual needs, thereby improving the accuracy of information responses.
[0064] In some embodiments, the target requirement information may include basic requirement information and extended requirement information output by the third model through processing resource-related features and object environment attributes. The basic requirement intent can have a high semantic similarity to the requirement information, and defects such as semantic ambiguity, logical inconsistencies, or missing information in the requirement information representation can be modified to ensure that the basic requirement information can accurately represent the requirement intent expressed by the requirement information, avoiding the illusion caused by the second model processing the requirement information, which leads to a mismatch between the response information and the actual requirement intent.
[0065] Therefore, by retrieving target demand information that meets the environmental conditions such as time conditions, location conditions, and environmental conditions represented by the object's environmental attributes, basic search results and extended search results can be obtained. This allows the second model to process the basic search results and extended search results based on the degree of demand for resource content represented by the demand intent description text. This enables the response information to more accurately represent the target object's basic demand intent and extended demand intent, reducing the interaction process for the target object to continue following up with extended demand intent to input demand information, lowering the complexity of interaction and the learning cost of interaction operation steps, and improving the efficiency of information interaction and the quality of response information push.
[0066] It should be noted that any data required to be obtained in any embodiment of this disclosure, including but not limited to resource-related characteristics and object environmental attributes, is obtained under the condition of obtaining authorization from the relevant users or organizations. The purpose of obtaining the data is disclosed to the relevant users or organizations before the data is obtained, and necessary encryption or desensitization measures are taken for the data, which complies with the provisions of relevant laws and regulations and does not violate public order and good morals.
[0067] In some embodiments, retrieving based on target demand information to obtain detection results may include: retrieving based on target demand information to obtain multiple initial detection results; and determining a detection result from the multiple initial detection results based on the semantic matching degree between the initial detection results and the target demand information.
[0068] In one example, semantic relevance can be detected between multiple initial search results obtained from retrieving basic demand information and the basic demand information itself. The semantic matching degree between the initial search results and the basic demand information is obtained. This allows for the determination of search results from the initial search results that are relevant to the topic of the basic demand information, thus avoiding the large model processing resources irrelevant to the basic demand intent to obtain response information, thereby improving the quality of the response information.
[0069] In one example, semantic relevance detection can be performed between multiple initial search results obtained from the extended demand information retrieval and the extended demand information itself. This yields the semantic matching degree between the initial search results and the extended demand information. Based on this semantic matching degree, relevant search results can be determined from the initial search results that are related to the topic of the extended demand information. This avoids the large model processing resources irrelevant to the extended demand intent to obtain response information, thus improving the quality of the response information.
[0070] In one example, multiple initial detection results can be retrieved based on basic and extended demand information, respectively. The semantics of these initial search results can be semantically correlated with their corresponding basic or extended demand information. Based on the semantic matching degree, the search result corresponding to each basic or extended demand information is determined from the multiple initial search results for each type of basic or extended demand information. Therefore, the second major model can be used to process the search results corresponding to multiple types of basic and extended demand information to obtain response information. This allows the second major model to combine the target audience's degree of demand for resource content to output response information that satisfies both basic and extended demand intents, thereby further improving the quality of information response.
[0071] In some embodiments, updating the demand information based on the resource-related features and object environment attributes related to the demand information may include: performing at least one subtask in the demand information update task using a third major model, wherein the subtask may include at least one of the first subtask and the second subtask.
[0072] The first subtask may include updating the requirement information based on its semantic features to obtain basic requirement information. By utilizing the third major model to process the requirement information and perform the first subtask, the output basic requirement information can correct or supplement any unclear semantic expressions in the requirement information, so that the basic requirement information can accurately represent the requirement intent expressed in the requirement information.
[0073] In some embodiments, based on task prompts for the first subtask, the third model is prompted to perform semantic understanding by processing requirement information, a pre-defined knowledge base, and contextual content related to the target object's requirement information to update the requirement information and output rewritten basic requirement information. The knowledge base can contain knowledge content related to various domains. By combining contextual content and the knowledge base to understand and rewrite the requirement information, the third model can more accurately understand the requirement intent and supplement or correct linguistic deficiencies in the requirement information by outputting updated basic requirement intent. This ensures that the basic search results retrieved based on the basic requirement information meet the actual requirement intent of the target object. Therefore, the quality of the response information can be improved by processing the basic search results and requirement intent description information based on the second model.
[0074] The second subtask may include: detecting the resource browsing habits of the target object based on the search resource features in the resource-related features, obtaining resource browsing habit attributes, and detecting extended needs based on the resource browsing habit attributes and object environment attributes, obtaining extended need information that represents the extended intention of the target object.
[0075] According to embodiments of this disclosure, browsing habit attributes can represent the target object's browsing preferences for resource content length, resource playback duration, and resource publisher (e.g., a specific video creator). By utilizing a third model to process resource browsing habit attributes and object environment attributes for extended demand detection, extended demand information can be matched with the target object's browsing habits for resources and the target object's current surrounding environment, thereby improving the match between extended demand information and the extended intention representing the target object's potential needs.
[0076] For example, the third model can be used to understand extended intent by processing the object's environmental attributes ("temperature 36 degrees Celsius, humidity 60%) and reading habit attributes ("preference for blogger A's travel guides"). The resulting extended demand information could include "searching for blogger A's travel guide posts about places recommended for escaping the summer heat." This would allow the extended search results to include page resources related to blogger A's travel guides on the theme of "escaping the summer heat." By using the second model to process the extended search results and the demand intent description text, travel planning information on the theme of "escaping the summer heat" can be generated as response information. This travel planning information can include blogger A's recommended locations and planned routes to meet the target audience's potential needs.
[0077] In some embodiments, the third model can obtain basic requirement information and extended requirement information by executing the first subtask and the second subtask, and the embodiments disclosed herein will not be described in detail here.
[0078] In some embodiments, the third model can be prompted to perform the first and second sub-tasks based on prompts for the first and second sub-tasks, respectively. The task prompts may include arbitrary prompts such as output text character limits and input information types, which will not be elaborated further in the embodiments of this disclosure.
[0079] It should be understood that target demand information includes at least one of basic demand information and extended demand information. In some embodiments, search results include at least one of basic search results and extended search results. Basic search results are retrieved based on basic demand information; extended search results are retrieved based on extended demand information. The second major model can generate response information that can meet the diverse demand intentions of the target object by semantically understanding the text describing the demand intent and at least one of the basic search results and extended search results, thereby improving the diversity, accuracy, and timeliness of the response information content, improving the quality of the response, and reducing the complexity of the interaction.
[0080] In some embodiments, updating the demand information based on resource-related features and object environment attributes related to the demand information to obtain target demand information may further include: using a third model to perform a demand information update task by processing resource-related features, object environment attributes, and object preference description text to obtain target demand information.
[0081] According to embodiments of this disclosure, the object preference description text is determined based on resource-related features. For example, a trained large language model can be used to process resource-related features such as resource content, historical search information, and historical comment data of resources viewed by the target object during a preset historical period to obtain the object preference description text.
[0082] In some embodiments, the object preference description text describes at least one preference attribute of the target object in a structured manner. For example, the object preference description text may include preference attribute names and corresponding descriptive text. For example, the object preference description text may be based on the content in Table 1 below.
[0083] Table 1
[0084]
[0085] By describing one or more ticket attributes of a target object using structured object preference description text, it is possible to summarize and compress resource-related features generated by the target object's interactions over historical periods, and remove noise interference from resource content accidentally clicked by the target object. Thus, while the first and second main models collaborate to process the target object's demand information, the object preference description text can be used to summarize the target object's preference attributes in a fine-grained manner. This allows for the description of the attention and preference intensity of multiple preference attributes in natural language, creating a long-term user profile of the target object. This enables the third main model to clearly understand the target object's preference intent based on structured natural language text, reducing interference from noisy data in resource-related features and mitigating the excessive computational overhead caused by processing resource-related features generated from the target object's long-term interactive behavior. This improves the accuracy of rewriting target demand information and enhances the matching degree between search results and the target object's basic or extended demand intent. Finally, the second main model collaboratively processes search results and demand intent description text to improve the quality of response information.
[0086] In some embodiments, the object preference description text can be determined based on the following operations: based on preference understanding prompts related to multiple preference attributes, semantic understanding of resource-related features and object attribute features of the target object is performed using an expert big model to obtain the object preference description text.
[0087] It should be noted that the object preference description text includes the description text corresponding to the preference attribute, such as the description text corresponding to the preference attribute name.
[0088] According to embodiments of this disclosure, an expert large model is used to understand and extract the semantics of object preference attributes from resource-related features of a target object, and then output object preference description text. Based on preference understanding prompts related to multiple preference attributes, it can be used to prompt semantic understanding and preference attribute description text extraction of resource-related features and object attribute features according to the preference attribute names of multiple preference attributes, and then generate description text corresponding to each of the multiple preference attributes.
[0089] It should be noted that the expert large model can be a trained large model used to understand and extract the semantics of the object preference attributes in resource-related features. For example, it can understand the prompting function of prompt information based on preferences, and output structured description text of object preferences by processing the interactive behavior attribute features in resource-related features such as resource content, browsing time features of target objects for resource content, and browsing frequency features.
[0090] For example, the preference understanding prompt could be: "You need to understand and extract the relevant features of the input resource based on the following six preference attributes to obtain the description text corresponding to each preference attribute name of the target object. The preference attribute names can be interest preference attribute, consumption preference attribute, lifestyle preference attribute, travel decision preference attribute, cultural type preference attribute, and emotional need preference attribute. The output description text is represented in a structured manner using preference attribute names."
[0091] In some embodiments, the object preference description text is determined by semantic fusion of resource-related features and object attribute features of the target object. For example, an expert large model can be used to process resource-related features, object attribute features, and preference understanding prompts to output the object preference description text.
[0092] Figure 3 The illustration shows a schematic diagram of the principle of the multi-model collaboration-based interaction method provided according to an embodiment of the present disclosure.
[0093] like Figure 3 As shown, a first input dataset 310, consisting of the target object's input demand information and other input data, is processed collaboratively by three large models with different functions to obtain response information 321. The first large model, based on intent understanding prompts, performs an intent understanding task on the demand information and resource-related features in the first input dataset 310, obtaining a demand intent description text that represents the target object's degree of demand for resource content. The third large model performs a demand information update task by processing resource-related features, object environment attributes, demand information, and object preference description text in the first input dataset 310, obtaining basic demand information and extended demand information. The object preference description text is determined by processing resource-related features in a preset historical period using an expert large model, and it represents the target object's preference attributes using a structured description method.
[0094] Based on basic and extended demand information, network searches are performed to obtain the corresponding basic and extended search results. The second major model processes the object preference description text, demand information, object environment attributes, demand description text, basic search results, extended search results, and basic and extended demand information from the first input dataset 310 to obtain response information 321 that satisfies the actual needs and intentions of the target object.
[0095] In some embodiments, using the second major model to perform semantic understanding on the demand intent description text and the search results determined based on demand information to obtain response information may also include: using the second major model to perform semantic understanding on the demand intent description text, search results, and object preference description text to obtain response information.
[0096] According to embodiments of this disclosure, the object preference description text describes at least one preference attribute of the target object in a structured manner, while the demand intent description text can describe the target object's demand for resource content and the changes in that demand in a fine-grained manner. Thus, by using a second model to process the demand intent description text and the object preference description text, the target object's preferences and the degree of demand for resource content can be captured more accurately. This enables effective summarization and filtering of resource content in the search results, ensuring that the output response information can more accurately meet the target object's actual preferences and describe and plan resource content with a high degree of demand from the target object, thereby improving the matching degree between the response information and the target object's actual demand intent.
[0097] In one embodiment, the second major model is used to perform semantic understanding on the text describing the demand intent, the basic search results, and the text describing the object preferences. This enables the response information to more accurately meet the basic demand intent expressed by the target object's input information, thereby avoiding the frequency of interaction operations caused by the target object repeatedly inputting new demand information to obtain detailed response information, and improving the user experience.
[0098] In one embodiment, the second major model is used to perform semantic understanding on the text describing the demand intent, the basic search results, the extended search results, and the text describing the object preferences. This enables the response information to more accurately meet the basic demand intent expressed by the demand information input by the target object, as well as the potential demand expressed by the target object through the input demand information. This avoids the target object repeatedly inputting new demand information to obtain response information related to the extended demand intent, thus reducing the frequency of interactive operations and improving the user experience.
[0099] In some embodiments, semantic understanding of the demand intent description text, search results, and object preference description text using the second major model may further include: semantic understanding of the demand intent description text, search results, object preference description text, and object context attributes related to demand information based on the second major model.
[0100] For example, the input data for the second major model can include the demand intent description text, basic search results, extended search results, object preference description text, and object environment attributes related to the demand information. This allows the second major model to perform semantic understanding and fusion of the basic and extended search results, based on a full understanding of the target object's demand for resource content, the target object's actual preferences, and the target object's surrounding environment. This enables the response information to integrate the resource information in the search results according to the target object's demand for resource content and their own preference attributes. Furthermore, it ensures that the response information meets the target object's requirements regarding the current time, geographical location, weather conditions, and other environmental conditions, thereby improving the matching degree between the response information and the target object's actual and potential needs. Ultimately, this generates personalized response information to enhance the interactive experience.
[0101] In one embodiment, the demand intent description text, basic demand information, extended demand information, basic search results, extended search results, object preference description text, and object environment attributes related to the demand information can be used as input data for the second major model, so that the response information can meet the actual needs of the target object.
[0102] In some embodiments, the response information includes trip planning information described in a structured form.
[0103] According to embodiments of this disclosure, the itinerary planning information includes: itinerary items that match the resource content preferred by the target audience, and itinerary planning content corresponding to the itinerary items.
[0104] The itinerary items can include resource content such as attraction names and restaurant names. The corresponding itinerary planning content can include planning content such as ticketing methods, opening hours, and transportation planning.
[0105] In one embodiment, the order of multiple itinerary items can be arranged according to the target object's demand for resource content. The itinerary items and itinerary planning content can be related to the basic demand intent and extended demand intent represented by the demand information input by the target object. This allows the target object to input demand information to output personalized response information that can meet the diverse needs of the target object more accurately, thereby improving the interactive experience.
[0106] Figure 4The illustration shows a schematic diagram of the principle of an interaction method based on multi-model collaboration provided according to another embodiment of the present disclosure.
[0107] like Figure 4 As shown, a second input dataset 410, consisting of the target object's input requirement information and other input data, is processed through collaboration between a first, second, and third large model with different functions to obtain response information 321. The requirement information could be, for example, the text "Where to go this weekend?" input by the target object.
[0108] The first model, based on intent understanding prompts, performs an intent understanding task on the demand information and related resource features related to the demand information in the second input dataset 410, and obtains demand intent description text 421 that can represent the degree of demand of the target object for resource content.
[0109] The demand intent description text 421 could be, for example: "Users might want to learn about tourist attractions in different cities, such as the unique light rail in city A where they can take photos, and especially hope to take a boat tour along the scenic river in city A. In addition, some users might also want to visit the famous temple A in city B." In the demand intent description text 421, "the unique light rail in city A," "boat tour," "scenic river," and "famous temple A" can be resource content. The demand intent description text 421 can use words such as "especially hope" and "may" and syntactic structures to indicate the degree of demand from the target audience for the resource content. It should be noted that the degree of demand for the resource content in the demand intent description text can be expressed in any form of natural language expression; the embodiments of this disclosure do not limit the specific form of the natural language expression.
[0110] The third model performs a demand information update task by processing resource-related features, object environment attributes, demand information, and object preference description text in the second input dataset 410, obtaining basic demand information 431 and extended demand information 432. Basic demand information 431 could be, for example, "good places to go in City A on weekends," while extended demand information 432 could be, for example, "recommendations for boat tours and other attractions." Retrieval is then performed using basic demand information 431 and extended demand information 432 respectively, yielding basic search results and extended search results.
[0111] The second large model processes the object preference description text, demand information, object environment attributes, demand description text, basic search results, extended search results, and basic search information and extended demand information in the second input dataset 410 to obtain itinerary planning information 441 that can meet the actual needs and intentions of the target object as the response information. The itinerary planning information describes the itinerary items and itinerary planning content in a structured form.
[0112] Examples of itinerary items could be "1. City A cruise + famous restaurant", "2. City A riverside park family trip", or "3. City A zoo". Itinerary planning content can be the specific planning details corresponding to each itinerary item. By utilizing the second major model to semantically understand basic and extended search results through demand intent description text, object preference description text, and object environment attributes, and generating structured itinerary planning information, this approach effectively meets users' actual needs, reduces the complexity of interactive operations caused by multiple rounds of interaction to obtain response information, and further improves the user experience.
[0113] Figure 5 A block diagram of an interactive device based on multi-model collaboration according to an embodiment of the present disclosure is shown schematically.
[0114] like Figure 5 As shown, the interactive device 500 based on multi-model collaboration includes: a receiving module 510, a first obtaining module 520, a second obtaining module 530, and a pushing module 540.
[0115] The receiving module 510 is used to receive the requirement information input by the target object.
[0116] The first acquisition module 520 is used to perform intent understanding on demand information and resource-related features using the first large model to obtain demand intent description text. Among them, resource-related features are related to the resources browsed by the target object, and demand intent description text represents the degree of demand of the target object for resource content in natural language form.
[0117] The second acquisition module 530 is used to perform semantic understanding on the demand intent description text and the retrieval results determined based on the demand information using the second major model to obtain response information.
[0118] The push module 540 is used to push reply information to the target object.
[0119] According to embodiments of this disclosure, the second obtaining module includes the first obtaining unit.
[0120] The first obtaining unit is used to perform semantic understanding on the demand intent description text, search results and object preference description text using the second major model to obtain response information. The object preference description text is determined by semantic fusion of resource-related features and object attribute features of the target object. The object preference description text describes at least one preference attribute of the target object in a structured manner.
[0121] According to embodiments of this disclosure, the first obtaining unit includes a semantic understanding subunit.
[0122] The semantic understanding subunit is used to perform semantic understanding based on the second major model of demand intent description text, retrieval results, object preference description text, and object environment attributes related to demand information. The object environment attributes are related to the environment around the target object during the input of demand information.
[0123] According to embodiments of this disclosure, the retrieval results are determined based on the following operations: updating the demand information based on resource-related features and object environment attributes related to the demand information to obtain target demand information, wherein the object environment attributes are related to the environment surrounding the target object during the input of demand information; and performing a retrieval based on the target demand information to obtain detection results.
[0124] According to embodiments of this disclosure, updating demand information based on resource-related features and object environment attributes related to demand information to obtain target demand information includes: using a third model to perform a demand information update task by processing resource-related features, object environment attributes, and object preference description text to obtain target demand information, wherein the object preference description text is determined based on resource-related features, and the object preference description text describes at least one preference attribute of the target object in a structured manner.
[0125] According to embodiments of this disclosure, updating demand information based on resource-related features and object environment attributes related to demand information includes: using a third major model to perform at least one of the following sub-tasks: updating demand information based on semantic features of demand information to obtain basic demand information; detecting resource browsing habits of the target object based on search resource features among resource-related features to obtain resource browsing habit attributes, and detecting extended demand based on resource browsing habit attributes and object environment attributes to obtain extended demand information representing the extended intention of the target object; wherein, the target demand information includes at least one of basic demand information and extended demand information.
[0126] According to embodiments of this disclosure, the search results include at least one of the following: basic search results retrieved based on basic demand information; and extended search results retrieved based on extended demand information.
[0127] According to embodiments of this disclosure, the object preference description text is determined based on the following operations: based on preference understanding prompts related to multiple preference attributes, semantic understanding of resource-related features and object attribute features of the target object is performed using an expert big model to obtain the object preference description text, which includes description text corresponding to the preference attributes.
[0128] According to embodiments of this disclosure, a retrieval based on target demand information to obtain detection results includes: retrieving multiple initial detection results based on target demand information; and determining a detection result from the multiple initial detection results based on the semantic matching degree between the initial detection results and the target demand information.
[0129] According to embodiments of this disclosure, the object environment attributes include at least one of the following attributes related to the process of inputting requirement information for the target object: time attribute, location attribute, and weather condition attribute.
[0130] According to embodiments of this disclosure, the response information includes itinerary planning information described in a structured form, which includes itinerary items that match the resource content preferred by the target object, and itinerary planning content corresponding to the itinerary items.
[0131] According to embodiments of this disclosure, the first obtaining module includes a first execution unit.
[0132] The first execution unit is used to perform intent understanding tasks based on intent understanding prompts and the first main model according to demand information and resource-related features. The intent understanding prompts are used to prompt the first main model to perform intent understanding tasks based on at least one of the following features related to resource content: browsing duration features, browsing frequency features, like behavior features, and comment behavior features.
[0133] According to embodiments of this disclosure, the first execution unit includes a first execution subunit.
[0134] The first execution subunit is used to perform intent understanding tasks based on intent understanding prompts and the first major model, according to the related resource features in the demand information and resource-related features. The related resource features and the demand information meet the preset semantic similarity conditions.
[0135] Figure 6 A schematic block diagram of an artificial intelligence agent according to an embodiment of the present disclosure is shown.
[0136] In embodiments of this disclosure, such as Figure 6 As shown, the AI agent 600 may include an input module 610, a processing module 620, and an output module 630.
[0137] Input module 610 is used to receive input information;
[0138] The processing module 620 is used to determine the target task based on the input information received by the input module, determine the large model based on the target task, and obtain output information by calling the large language model to execute the multi-model collaboration-based interaction method provided in the embodiments of this disclosure.
[0139] Output module 630 is used to output the output information obtained by the processing module.
[0140] According to embodiments of this disclosure, the input module 610 is responsible for receiving or sensing information such as queries, requests, instructions, signals, or data from the outside world (e.g., users or the external environment), and converting it into a format that the AI agent 600 can understand and process. The input module 610 is the primary link for the AI agent 600 to interact with the outside world, enabling the AI agent 600 to efficiently and accurately obtain necessary "sensory" information from the outside world and respond to this information.
[0141] In the example, input module 610 can input the requirement information, resource-related characteristics, etc. described above.
[0142] In the example, processing module 620 is the core support for the AI agent 600's ability to handle complex tasks. Processing module 620 can execute the multi-model collaborative interaction method described above.
[0143] In the example, the performance of processing module 620 is closely related to the large model on which the AI agent 600 is based. To fully leverage the capabilities of the large model, the internal structure of processing module 620 can be designed to be highly configurable and scalable to handle various types of tasks and requirements in real-world scenarios.
[0144] In the example, after the AI agent 600 obtains the demand information, the processing module 620 can use the first major model to process the demand information and resource-related features to obtain the demand intent description text, use the second major model to process the demand intent description text and the search results determined based on the demand information to obtain the response information, and then pass the response information to the output module 630.
[0145] Understandably, while large language models possess excellent language understanding and generation capabilities, like humans, their ability to solve tasks is limited without the aid of any tools. Once the AI agent 600 is given the ability to invoke tools, it can perform tasks such as using a calculator to complete mathematical calculations, using Python to perform data analysis, and using a search engine to generate weather forecasts.
[0146] In the example, output module 630 can output the response information described above.
[0147] The AI agent 600 according to embodiments of this disclosure can simply and effectively improve the level of intelligence, as well as enhance flexibility and versatility.
[0148] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.
[0149] According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method described above.
[0150] According to embodiments of the present disclosure, a non-transitory computer-readable storage medium stores computer instructions, wherein the computer instructions are used to cause a computer to perform the method described above.
[0151] According to an embodiment of this disclosure, a computer program product includes a computer program that, when executed by a processor, implements the method described above.
[0152] Figure 7 A schematic block diagram of an example electronic device is shown that can be used to implement the multi-model collaboration-based interaction method of embodiments of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0153] like Figure 7 As shown, device 700 includes a computing unit 701, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 702 or a computer program loaded into random access memory (RAM) 703 from storage unit 708. The RAM 703 may also store various programs and data required for the operation of device 700. The computing unit 701, ROM 702, and RAM 703 are interconnected via bus 704. Input / output (I / O) interface 705 is also connected to bus 704.
[0154] Multiple components in device 700 are connected to I / O interface 705, including: input unit 706, such as keyboard, mouse, etc.; output unit 707, such as various types of monitors, speakers, etc.; storage unit 708, such as disk, optical disk, etc.; and communication unit 709, such as network card, modem, wireless transceiver, etc. Communication unit 709 allows device 700 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0155] The computing unit 701 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as multi-model collaboration-based interaction methods. For example, in some embodiments, the multi-model collaboration-based interaction method can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program can be loaded and / or installed on device 700 via ROM 702 and / or communication unit 709. When the computer program is loaded into RAM 703 and executed by the computing unit 701, one or more steps of the multi-model collaboration-based interaction method described above can be performed. Alternatively, in other embodiments, computing unit 701 may be configured to perform an interaction method based on multi-model collaboration by any other suitable means (e.g., by means of firmware).
[0156] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0157] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0158] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0159] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0160] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0161] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, distributed system servers, or servers incorporating blockchain technology.
[0162] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0163] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. An interaction method based on multi-model collaboration, comprising: Receive the requirement information input by the target object; The first major model is used to perform intent understanding on the demand information and resource-related features to obtain demand intent description text. The resource-related features are related to the resources browsed by the target object, and the demand intent description text represents the degree of demand for resource content by the target object in natural language form. The second major model is used to perform semantic understanding on the text describing the demand intent and the search results to obtain response information; The response information is pushed to the target object; The search results are determined based on the following operations: The third model is used to perform a requirement information update task by processing the resource-related features, object environment attributes related to the requirement information, and object preference description text to obtain target requirement information. The object preference description text is determined based on the resource-related features, the object environment attributes are related to the environment around the target object during the input of the requirement information, and the object preference description text describes at least one preference attribute of the target object in a structured manner. The search is performed based on the target demand information to obtain the search results; The target requirement information includes extension requirement information characterizing the extension intention of the target object, and the extension requirement information is determined based on the following operations: The third model is used to detect the resource browsing habits of the target object based on the search resource features in the resource-related features, so as to obtain the resource browsing habit attributes. Then, based on the resource browsing habit attributes and the object environment attributes, extended demand detection is performed to obtain the extended demand information.
2. The method according to claim 1, wherein, The target requirement information also includes basic requirement information. The third major model is used to perform the following sub-tasks to determine the requirement information: The demand information is updated based on its semantic features to obtain basic demand information.
3. The method according to claim 2, wherein, The search results include at least one of the following: Basic search results retrieved based on the aforementioned basic requirements information; The extended search results retrieved based on the extended demand information.
4. The method according to claim 1, wherein, The object preference description text is determined based on the following operations: Based on preference understanding prompts related to multiple preference attributes, an expert big model is used to perform semantic understanding on the resource-related features and the object attribute features of the target object to obtain the object preference description text, which includes description text corresponding to the preference attributes.
5. The method according to any one of claims 1 to 4, wherein, The process of retrieving the detection results based on the target demand information includes: Based on the target requirement information, a retrieval was performed to obtain multiple initial detection results; and The detection result is determined from multiple initial detection results based on the semantic matching degree between the initial detection results and the target requirement information.
6. The method according to claim 1, wherein, The second major model is used to perform semantic understanding on the text describing the demand intent and the search results determined based on the demand information to obtain response information, including: The second major model is used to perform semantic understanding on the demand intent description text, the search results, and the object preference description text to obtain the response information. The object preference description text is determined by semantic fusion of the resource-related features and the object attribute features of the target object. The object preference description text describes at least one preference attribute of the target object in a structured manner.
7. The method according to claim 6, wherein, The step of using the second major model to perform semantic understanding on the demand intent description text, the search results, and the object preference description text includes: Based on semantic understanding of the demand intent description text, the search results, the object preference description text, and the object environment attributes related to the demand information using the second major model, wherein the object environment attributes are related to the environment surrounding the target object during the input of the demand information.
8. The method according to claim 7, wherein, The object environment attributes include at least one of the following attributes related to the process of the target object inputting the requirement information: Time attribute, location attribute, weather condition attribute.
9. The method according to claim 1, wherein, The response information includes itinerary planning information described in a structured form, which includes itinerary items that match the resource content preferred by the target object, and itinerary planning content corresponding to the itinerary items.
10. The method according to claim 1, wherein, The first major model is used to perform intent understanding on the demand information and resource-related features to obtain demand intent description text, including: Based on intent understanding prompts, the first major model performs an intent understanding task according to the demand information and the resource-related features. The intent understanding prompts are used to prompt the first major model to perform the intent understanding task based on at least one of the following features related to the resource content: Browsing duration characteristics, browsing frequency characteristics, liking behavior characteristics, and commenting behavior characteristics.
11. The method according to claim 10, wherein, The intent-based prompt information, utilizing the first major model to perform intent understanding tasks based on the demand information and the resource-related features, includes: Based on the intent understanding prompts, the first model is used to perform intent understanding tasks according to the demand information and the associated resource features in the resource-related features, wherein the associated resource features and the demand information satisfy a preset semantic similarity condition.
12. An interactive device based on multi-model collaboration, comprising: The receiving module is used to receive the requirement information input by the target object; The first acquisition module is used to perform intent understanding on the demand information and resource-related features using the first major model to obtain demand intent description text, wherein the resource-related features are related to the resources browsed by the target object, and the demand intent description text represents the degree of demand of the target object for resource content based on natural language form; The second acquisition module is used to perform semantic understanding on the demand intent description text and the search results determined based on the demand information using the second major model, and to obtain response information; and The push module is used to push the reply information to the target object; The search results are determined based on the following operations: The third model is used to perform a requirement information update task by processing the resource-related features, object environment attributes related to the requirement information, and object preference description text to obtain target requirement information. The object preference description text is determined based on the resource-related features, the object environment attributes are related to the environment around the target object during the input of the requirement information, and the object preference description text describes at least one preference attribute of the target object in a structured manner. The search is performed based on the target demand information to obtain the search results; The target requirement information includes extension requirement information characterizing the extension intention of the target object, and the extension requirement information is determined based on the following operations: The third model is used to detect the resource browsing habits of the target object based on the search resource features in the resource-related features, so as to obtain the resource browsing habit attributes. Then, based on the resource browsing habit attributes and the object environment attributes, extended demand detection is performed to obtain the extended demand information.
13. The apparatus according to claim 12, wherein, The second obtaining module includes: The first obtaining unit is used to perform semantic understanding on the demand intent description text, the search results, and the object preference description text using the second large model to obtain the response information. The object preference description text is determined by semantic fusion of the resource-related features and the object attribute features of the target object. The object preference description text describes at least one preference attribute of the target object in a structured manner.
14. The apparatus according to claim 13, wherein, The first obtaining unit includes: The semantic understanding subunit is used to perform semantic understanding on the demand intent description text, the search results, the object preference description text, and object environment attributes related to the demand information based on the second major model, wherein the object environment attributes are related to the environment around the target object during the input of the demand information.
15. An intelligent agent of artificial intelligence, comprising: The input module is used to receive input information; The processing module is configured to determine a target task based on the input information received by the input module, determine a large model based on the target task, and obtain output information by calling the large model to execute the method of any one of claims 1 to 11. An output module is used to output the output information obtained by the processing module.
16. An electronic device comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.
17. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1 to 11.
18. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 11.