Interaction method, device, agent, electronic device, storage medium and program product

By clustering and identifying the retrieved information, reliable target retrieved information is selected for feedback, which solves the problem of misleading information caused by false or low-quality retrieved information and improves the accuracy and efficiency of human-computer interaction.

CN122240843APending Publication Date: 2026-06-19BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing search enhancement generation systems, the search information from online platforms may contain false or low-quality information, leading to insufficient accuracy and effectiveness of the results in the human-computer interaction feedback.

Method used

By clustering multiple search results, identifying anomalies in the clusters, selecting reliable target search information for feedback, and generating results using a large language model.

Benefits of technology

It improves the effectiveness and accuracy of feedback results in human-computer interaction scenarios, avoids misleading issues, and enhances the recognition efficiency of target retrieval information.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240843A_ABST
    Figure CN122240843A_ABST
Patent Text Reader

Abstract

This disclosure provides an interaction method, device, intelligent agent, electronic device, storage medium, and program product, relating to the field of artificial intelligence technology, particularly to the fields of intelligent search, clustering, and deep learning. The specific implementation scheme includes: retrieving input information input through an interactive interface to obtain multiple retrieval information; clustering the multiple retrieval information based on their respective semantic information to obtain clusters; identifying the clusters and determining an identification result indicating whether the clusters are abnormal; determining target retrieval information for reference from the clusters based on the identification result; and generating feedback results for responding to the input information based on the target retrieval information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, particularly to the fields of intelligent search, clustering, and deep learning, and specifically to interaction methods, devices, intelligent agents, electronic devices, storage media, and program products. Background Technology

[0002] Retrieval-augmented generation (RAG) technology combines deep learning and information retrieval techniques and applies them to human-computer interaction. It can use retrieved information to guide the generation of results and provide answers that match the question. However, the effectiveness of the retrieved information becomes a limiting factor in outputting accurate answers. Summary of the Invention

[0003] This disclosure provides an interaction method, apparatus, intelligent agent, electronic device, storage medium, and program product.

[0004] According to one aspect of this disclosure, an interactive method is provided, comprising: retrieving input information input via an interactive interface to obtain multiple retrieval information; clustering the multiple retrieval information based on the semantic information of each of the multiple retrieval information to obtain clusters; identifying the clusters to determine an identification result indicating whether the clusters are abnormal; determining target retrieval information for reference from the clusters based on the identification result; and generating a feedback result for responding to the input information based on the target retrieval information.

[0005] According to another aspect of this disclosure, an interactive device is provided, comprising: a retrieval module for retrieving input information input via an interactive interface to obtain multiple retrieval information; a clustering module for clustering the multiple retrieval information based on the semantic information of each of the multiple retrieval information to obtain clusters; an identification module for identifying the clusters and determining an identification result indicating whether the clusters are abnormal; a determination module for determining target retrieval information for reference from the clusters based on the identification result; and a generation module for generating a feedback result for feeding back the input information based on the target retrieval information.

[0006] According to another aspect of this disclosure, an intelligent agent is provided, comprising: an input module for receiving input information; a processing module for determining a target task based on the input information received by the input module, determining a large model based on the target task, and obtaining output information by calling the large model to execute the method described above; and an output module for outputting the output information obtained by the processing module.

[0007] According to another aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method described above.

[0008] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause a computer to perform the methods described above.

[0009] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method described above.

[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0011] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0012] Figure 1 This illustration schematically shows an exemplary system architecture to which interactive methods and apparatus can be applied according to embodiments of the present disclosure;

[0013] Figure 2 A flowchart illustrating an interaction method according to an embodiment of the present disclosure is shown schematically;

[0014] Figure 3 A schematic diagram illustrating the process of determining feedback results according to relevant examples of this disclosure is shown.

[0015] Figure 4 This schematically illustrates a data flow diagram for determining the identification result according to an embodiment of the present disclosure;

[0016] Figure 5A A schematic diagram illustrating clustering results according to embodiments of the present disclosure is shown.

[0017] Figure 5B This illustration schematically shows a diagram of determining target retrieval information based on normal clustering according to an embodiment of the present disclosure;

[0018] Figure 5C This illustration schematically shows a diagram of determining target retrieval information based on abnormal clustering according to an embodiment of the present disclosure;

[0019] Figure 6 The diagram illustrates the interactive interface and interaction process according to embodiments of the present disclosure.

[0020] Figure 7 A block diagram of an interactive device according to an embodiment of the present disclosure is shown schematically;

[0021] Figure 8 A schematic diagram illustrating the structure of an intelligent agent according to embodiments of the present disclosure is shown; and

[0022] Figure 9 A block diagram of an electronic device suitable for implementing an interaction method according to an embodiment of the present disclosure is shown schematically. Detailed Implementation

[0023] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0024] Existing retrieval enhancement generation systems can acquire target information, such as input information entered by users through an interactive interface, retrieve retrieval information that matches the input information from various online platforms, and use a Large Language Model (LLM) to generate feedback results for the target information.

[0025] However, since the search information on online platforms may be falsified or of low quality, if false or low-quality search information is used as reference content and input into the large language model and fed back to the target audience, it will cause misleading problems.

[0026] In view of this, embodiments of this disclosure provide an interactive method, including: retrieving input information input via an interactive interface to obtain multiple retrieval information; clustering the multiple retrieval information based on the semantic information of each of the multiple retrieval information to obtain clusters; identifying the clusters to determine an identification result indicating whether the clusters are abnormal; determining target retrieval information for reference from the clusters based on the identification result; and generating feedback results for feeding back input information based on the target retrieval information.

[0027] By using the interaction method provided in this embodiment, multiple search information are clustered to obtain clusters, and the clusters are used as identification units for identification. The number of target search information and the determination rules used for reference can be adaptively adjusted according to the identification results. This improves the identification efficiency of target search information and enhances the effectiveness and accuracy of generating feedback results based on target search information in human-computer interaction scenarios, thus avoiding misleading issues.

[0028] Figure 1 The illustration schematically depicts an exemplary system architecture to which interactive methods and apparatus can be applied according to embodiments of the present disclosure.

[0029] It is important to note that Figure 1 The examples shown are merely examples of system architectures that can be applied to embodiments of this disclosure, intended to help those skilled in the art understand the technical content of this disclosure. However, they do not imply that embodiments of this disclosure cannot be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which interactive methods and devices can be applied may include a terminal device, but the terminal device may implement the interactive methods and devices provided by embodiments of this disclosure without interacting with a server.

[0030] like Figure 1 As shown, the system architecture 100 according to this embodiment may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the terminal devices 101, 102, and 103 and the server 105. The network 104 may include various connection types, such as wired and / or wireless communication links, etc.

[0031] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients, and / or social platform software, etc. (for example only).

[0032] Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers.

[0033] Server 105 can be a server that provides various services, such as a backend management server that supports the content browsed by users using terminal devices 101, 102, and 103 (for example only). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.

[0034] It should be noted that the interaction method provided in the embodiments of this disclosure can generally be executed by terminal devices 101, 102, or 103. Accordingly, the interaction device provided in the embodiments of this disclosure can also be disposed in terminal devices 101, 102, or 103.

[0035] Alternatively, the interaction method provided in this embodiment can generally be executed by server 105. Correspondingly, the interaction device provided in this embodiment can generally be located in server 105. The interaction method provided in this embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105. Correspondingly, the interaction device provided in this embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105.

[0036] For example, when a user engages in human-computer interaction, terminal devices 101, 102, and 103 can acquire input information entered by the user through the interactive interface, and then send the input information to server 105. Server 105 retrieves multiple search results from the input information; based on the semantic information of each of the multiple search results, it clusters the multiple search results to obtain clusters; it identifies the clusters and determines an identification result indicating whether the clusters are abnormal; based on the identification result, it determines the target search information for reference from the clusters; and based on the target search information, it generates feedback results for responding to the input information. Alternatively, a server or server cluster capable of communicating with terminal devices 101, 102, and 103 and / or server 105 can analyze the input information and ultimately generate feedback results for responding to the input information.

[0037] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0038] In the technical solutions disclosed herein, the collection, storage, use, processing, transmission, provision, disclosure, and application of any type of information, such as user personal information, comply with the provisions of relevant laws and regulations, necessary confidentiality measures have been taken, and they do not violate public order and good morals.

[0039] In the technical solution disclosed herein, the user's authorization or consent is obtained before acquiring or collecting the user's personal information.

[0040] It should be noted that the sequence numbers of the operations in the following methods are for descriptive purposes only and should not be considered as indicating the execution order of the operations. Unless explicitly stated otherwise, the method does not need to be executed in the exact order shown.

[0041] Figure 2 A flowchart illustrating an interaction method according to an embodiment of this disclosure is shown schematically.

[0042] like Figure 2As shown, the method includes operations S210~S250.

[0043] In operation S210, the input information entered through the interactive interface is retrieved to obtain multiple search results.

[0044] In operation S220, based on the semantic information of each of the multiple search information, the multiple search information are clustered to obtain clusters.

[0045] In operation S230, clusters are identified, and the identification results indicating whether clusters are abnormal are determined.

[0046] In operation S240, based on the recognition results, target retrieval information for reference is determined from the clusters.

[0047] In operation S250, based on the target retrieval information, feedback results are generated to provide feedback on the input information.

[0048] There are no restrictions on the type of input information. For example, it can include text information entered through input boxes on the interactive interface, as well as voice information entered through voice upload controls or image information entered through image upload controls on the interactive interface. Any information that can express the target's interactive intent is acceptable.

[0049] Human-computer interaction can be achieved using large language models from artificial intelligence, but it is not limited to this. Multimodal Large Language Models (MLLMs) can also be used for human-computer interaction. For example, when the input information is text, a large language model can be used directly, while when the input information is speech or image, a multimodal large model can be used.

[0050] Information retrieval can refer to relevant information obtained from sources such as open-source databases and multimedia platforms based on input information. In human-computer interaction, when large models, such as large language models or multimodal models, cannot answer questions based on their own model knowledge and require the use of retrieval knowledge to generate feedback results, information is retrieved.

[0051] For example, if you input information indicating that you want to inquire about the weather information for a certain city on a certain date, you can retrieve the knowledge of the weather information for that city on that date from the knowledge base and use it as the search information.

[0052] When multiple search results are obtained, their reliability varies, and some may contain false or low-quality information. In such cases, the search results can be filtered to improve the effectiveness and quality of the information input into the larger model.

[0053] Multiple search results can be clustered based on their respective semantic information, so that search results with matching semantic information can be grouped into a single cluster, thereby obtaining at least one cluster.

[0054] For example, given the input "How are AA skincare products?", multiple search results for reviews of "AA skincare products" are obtained. Based on the semantic information of these search results, three clusters can be identified. Cluster A expresses semantic information roughly as "the ingredients used in AA skincare products...", cluster B expresses semantic information roughly as "the main functions of AA skincare products...", and cluster C expresses semantic information roughly as "the cost-effectiveness of AA skincare products...".

[0055] Clusters can be identified individually to determine whether they are abnormal. Abnormal clusters may indicate low reliability of the retrieved information, the potential presence of false information, or low-quality content in the retrieved information. Normal clusters may indicate high-quality, highly reliable, or genuine and valid retrieved information.

[0056] Based on the identification results, target retrieval information for reference is determined from the clusters. The identification results can be used to determine an extraction strategy for extracting target retrieval information from the clusters. The extraction strategy may include at least one of the following: extraction quantity and extraction rules. Target retrieval information is then extracted from the clusters according to the extraction strategy.

[0057] By using the interaction method provided in this embodiment, multiple search information are clustered to obtain clusters, and the clusters are used as identification units for identification. The number of target search information and the determination rules used for reference can be adaptively adjusted according to the identification results. This improves the identification efficiency of target search information and enhances the effectiveness and accuracy of generating feedback results based on target search information in human-computer interaction scenarios, thus avoiding misleading issues.

[0058] The above text explains how to use clustering to determine target retrieval information. The following text will refer to... Figure 3 Examples of relevant feedback results are provided.

[0059] Figure 3 A schematic diagram illustrating the process of determining feedback results according to relevant examples of this disclosure is shown.

[0060] like Figure 3 As shown, the target user inputs information through the interactive interface. In response to receiving the input information, multiple search results related to the input information are retrieved from various multimedia platforms.

[0061] like Figure 3As shown, multiple search results can include comment information from accounts A, B, and C on the multimedia platform.

[0062] like Figure 3 As shown, accounts A, B, and C all posted their respective comments on the multimedia platform on the morning of [date], and the semantic information of their comments was largely the same.

[0063] Comments from accounts A, B, and C can be directly used as target search information and combined with the input information to generate feedback results. However, because some platforms lack professional evaluation of the published information, the comments may only represent personal opinions and could be biased or fabricated. Directly using the obtained comment information as target search information to generate feedback results may be misleading.

[0064] With Figure 3 Compared to the example shown, the interactive method provided by the embodiments of this disclosure can perform clustering, identification, and filtering operations on multiple retrieved search information, thereby improving the effectiveness and reliability of the feedback results generated based on the target search information.

[0065] According to embodiments of this disclosure, in Figure 2 In the operation S210 shown, the input information input through the interactive interface is retrieved to obtain multiple retrieval information, which may include: determining source information based on the intent recognition result of the input information, wherein the source information indicates the source of the retrieval information; and determining multiple retrieval information that matches the input information based on the source information.

[0066] Intent recognition of input information yields intent recognition results that characterize the user's intent. These intent recognition results can indicate the actions to be performed before generating feedback on the input information; these actions can include retrieval, querying, etc.

[0067] For example, when the input information is "1+1 equals what", the intent recognition result represents the user's intent to perform simple arithmetic. Therefore, the intent recognition result indicates that the large model only needs to use its own computing power or memory capacity to generate the feedback result.

[0068] When the input information is "How are AA skincare products?", the intent recognition result indicates that the user's intent is to obtain an evaluation of AA skincare products. Therefore, the intent recognition result indicates that evaluation data needs to be retrieved and feedback results need to be generated.

[0069] Based on the intent recognition results, the information source information that needs to be accessed during the retrieval of input information can be determined. This information source information can be obtained through pre-setting and may include accessible data sources.

[0070] For example, information sources may include university databases, top media outlets, vertical industry portals, thesis databases, industry expert accounts, high-value community accounts, general self-media accounts, and commercial content sources.

[0071] Because the data sources corresponding to the aforementioned information sources are diverse, the methods for obtaining data usually differ depending on the intent recognition result. For example, if the intent recognition result indicates a need to query product reviews, the required data can be obtained from the comment section of commercial content sources or review accounts of self-media. If the intent recognition result indicates a need to query rainfall data, the required data can be obtained from official meteorological platforms.

[0072] Therefore, the operation of determining source information based on the intent recognition result of the input information can specifically include: when the intent recognition result indicates that retrieval is required, determining the source information based on the intent recognition result and the source mapping relationship, wherein the source mapping relationship represents the correspondence between the intent recognition result and the source.

[0073] Based on the intent recognition results and the source mapping relationship, source information that better matches the intent represented by the intent recognition results is obtained, thus standardizing the selection of source information and improving the efficiency of retrieval.

[0074] Once the source information is determined, it can be retrieved from the information source corresponding to the source information based on the input information, and multiple retrieval information matching the input information can be obtained.

[0075] After searching the information source corresponding to each information source, the search results can be filtered to select a preset number of search results that are most relevant to the input information.

[0076] According to embodiments of this disclosure, information sources are selected based on the intent of the input information, making the search results more relevant to the needs and improving the matching degree between the search information and the input information.

[0077] According to embodiments of this disclosure, in Figure 2 In the operation S220 shown, clustering multiple search information based on their respective semantic information to obtain clusters may include: determining the semantic similarity between multiple search information based on their respective semantic information to obtain multiple semantic similarities; and clustering multiple search information based on multiple semantic similarities to obtain clusters.

[0078] The semantic information of retrieved information can be obtained through feature extraction. For example, word embedding models can be used to map multiple retrieved information entries into high-dimensional vectors, which can then be used as the semantic information of the retrieved information. Taking K retrieved information entries as an example, after mapping, a high-dimensional vector set containing the semantic information of each of the K retrieved information entries can be obtained. .

[0079] Density-based clustering algorithms (such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) or hierarchical clustering) can be used to cluster high-dimensional vectors in a high-dimensional vector set V, thereby dividing it into M semantic clusters, resulting in a set. Where M ≤ V.

[0080] For example, based on the semantic information of each of the multiple search information, the semantic similarity between each pair of the multiple search information can be calculated. Based on the multiple semantic similarities, the high-dimensional vectors corresponding to the semantic information of each of the multiple search information can be represented in a high-dimensional vector space. According to the distance between the high-dimensional vectors in the high-dimensional vector space, M semantic clusters can be determined. Among them, the higher the semantic similarity of the search information, the shorter the distance between the high-dimensional vectors corresponding to the search information in the high-dimensional vector space.

[0081] According to embodiments of this disclosure, clustering is performed based on the semantic information of the search information, rather than on the keywords contained in the search information. This enables the formation of semantic clusters in the semantic space based on high-dimensional semantic division, improving the accuracy of clustering and avoiding errors in subsequent screening processes caused by coarse classification based on keywords.

[0082] According to embodiments of this disclosure, in Figure 2 In the operation S230 shown, identifying clusters and determining identification results indicating whether clusters are abnormal may include: identifying the source information of the search information in the clusters to obtain a source identification result, where the source information indicates the source of the search information; if the source identification result indicates that the clusters are abnormal, identifying the information features of the search information in the clusters to obtain an information feature identification result, where the information features indicate the attributes of the search information; and obtaining an identification result based on the source identification result and the information feature identification result.

[0083] Source identification results can include whether the source of the retrieved information in the cluster is credible, and determine the distribution of the retrieved information by analyzing the source of the retrieved information in the cluster.

[0084] If the source identification results indicate that the source of the retrieved information is unreliable, or that the distribution of the retrieved information is abnormal, it can be determined that the source identification results indicate that there is an anomaly in the cluster.

[0085] The information characteristics of retrieved information can include attributes such as the semantics of the retrieved information and the time when the retrieved information was generated.

[0086] Information feature recognition results can be used to indicate whether there are anomalies in the information features of the retrieved information. For example, if the semantics of the retrieved information are too concentrated and the generation time of the retrieved information is too concentrated, the information feature recognition results can indicate that there may be false information being released simultaneously in the retrieved information of that cluster.

[0087] If the source identification result indicates that there are no anomalies in the cluster, it means that the source of the retrieved information is credible. Therefore, it can be determined that the identification result indicates that there are no anomalies in the cluster.

[0088] If the source identification result indicates that the cluster is abnormal, but the information feature identification result indicates that the cluster is not abnormal, it can be determined that the identification result indicates that the cluster is not abnormal.

[0089] If both the source identification result and the information feature identification result indicate that there is an anomaly in the cluster, it can be determined that the identification result indicates that there is an anomaly in the cluster.

[0090] According to the embodiments of this disclosure, whether a cluster is abnormal is identified from both source information and information features, realizing multi-dimensional verification of anomaly determination, improving the accuracy of the identification results, and avoiding the one-sidedness of single-dimensional judgment.

[0091] According to embodiments of this disclosure, identifying the source information of retrieval information in a cluster to obtain a source identification result may specifically include: identifying the diversity of the source information of retrieval information in the cluster to obtain a first source identification result; identifying the specialization of the source information of retrieval information in the cluster to obtain a second source identification result; and obtaining a source identification result based on the first and second source identification results.

[0092] The diversity of information sources for multiple retrievals within a cluster can be determined by the entropy of the information sources, which characterizes the degree of uncertainty or disorder in the information.

[0093] The greater the entropy of the information source, the more likely that the multiple retrieval information in a cluster comes from multiple different types of information sources. Since the semantic information of multiple retrieval information in a cluster is similar, it can be determined that the above semantic information can be supported by multiple different types of information sources, thus the credibility is high.

[0094] The lower the entropy of the information source, the more singular the source of the multiple search results within the cluster. For example, if multiple search results mostly come from accounts on the same self-media platform, the search results are more likely to have been mass-produced manually for marketing or promotion, and therefore have lower credibility.

[0095] Generally, the credibility of data sources corresponding to different information sources also varies. Therefore, in addition to identifying the diversity of information sources, we can also identify information sources based on the professionalism of the information sources of the retrieved information, and obtain a second information source identification result.

[0096] Specifically, the information source information can be divided into multiple levels according to its professionalism, and different professionalism scores can be assigned to different levels. For a cluster, the scores corresponding to the information source information of multiple retrieval information in the cluster are weighted and summed to obtain the second information source identification result.

[0097] Information sources can be categorized by professionalism as follows: the first tier includes university databases and top media outlets; the second tier includes vertical industry portals, thesis databases, and industry expert accounts; the third tier includes high-value community accounts; the fourth tier includes general self-media accounts; and the fifth tier includes commercial content sources.

[0098] The reciprocal of the level at which each source information is located can be used as its professionalism score. For example, the professionalism score for the first level is 1, the second level is 1 / 2, the third level is 1 / 3, the fourth level is 1 / 4, and the fifth level is 1 / 5. This is just one method of quantifying professionalism; different levels of source information can be arbitrarily set according to actual task requirements.

[0099] In one example, a cluster contains 10 retrieval information items. Among them, the source information of 3 retrieval information items is at the third level, the source information of 2 retrieval information items is at the fourth level, and the source information of 5 retrieval information items is at the fifth level. Then the second source identification result of this cluster is (3*1 / 3+2*1 / 4+5*1 / 5) / 10=0.25.

[0100] After obtaining the first source identification result and the second source identification result, the first source identification result and the second source identification result can be weighted and summed to obtain the source identification result.

[0101] The source identification result can be compared with the source threshold. If the source identification result is less than the source threshold, it is determined that the source identification result indicates that there is an anomaly in the cluster.

[0102] According to embodiments of this disclosure, a first source identification result is obtained by identifying the diversity of information sources, which reflects the richness of information sources for the retrieved information in the cluster. A second source identification result is determined by the professionalism of the source information, reflecting the quality of the retrieved information through the professionalism of the source. Combining these two results to determine the source identification result allows for a comprehensive evaluation of the cluster in terms of both source diversity and source professionalism, providing more accurate support for subsequent cluster anomaly identification.

[0103] According to embodiments of this disclosure, identifying the diversity of source information of retrieval information in a cluster to obtain a first source identification result includes: classifying the source information of each retrieval information in the cluster into categories to obtain source category results; and determining the first source identification result based on the source category results and the number of retrieval information in the cluster.

[0104] By statistically analyzing the source information of each retrieved item within a cluster and classifying each source information into categories, we can obtain source category results that represent the quantity of each source information. Based on these source category results, we can determine the quantity of retrieved items originating from each source information within the cluster.

[0105] Based on the source category results and the amount of retrieval information in the clusters, the entropy of the retrieval information can be calculated to obtain the first source identification result. As shown in equation (1):

[0106] (1)

[0107] in, Let N represent the j-th cluster, and N represent the number of source information involved in the retrieved information within the cluster. This represents the proportion of retrieved information originating from the i-th source within a cluster, out of the total number of retrieved information items in the cluster. λ represents the number of retrieved documents in a cluster, and λ represents the small sample penalty coefficient. When there are few retrieved documents in a cluster, the small sample penalty coefficient can be set to a larger value to increase the likelihood of a smaller number of documents. The calculation results indicate that the number of samples in the cluster is too small, resulting in low confidence. When there are many documents containing retrieval information in the cluster, the small sample penalty coefficient can be set to a smaller value to... The calculation results can reflect the true distribution of information source information.

[0108] According to embodiments of this disclosure, the distribution of information sources is clarified through the information source category results. Based on the information source category results and the quantity of retrieval information in the clusters, the entropy of the retrieval information is determined, and the entropy of the retrieval information is used as the first information source identification result, providing a unified standard for the quantification of information source diversity and realizing standardized diversity identification.

[0109] When the source identification results indicate that there are abnormalities in the clusters, it means that there are problems with the distribution of the source information of the retrieved information in the clusters and the distribution of the source information itself, and it cannot be guaranteed to be completely reliable. Therefore, the information features of the retrieved information can be identified, and the clusters can be further identified by combining the information feature identification results.

[0110] According to embodiments of this disclosure, the information features of the retrieved information in the cluster are identified to obtain information feature identification results, including: identifying the publication time period of the retrieved information in the cluster to obtain publication feature identification results; identifying the topic information of the retrieved information in the cluster to obtain topic diversity identification results; and obtaining information feature identification results based on the publication feature identification results and the topic diversity identification results.

[0111] Normally, even with breaking news, the content and discussions posted by regular users and media will accumulate naturally over time. However, content generated through manipulation such as fake comments and marketing will be released in a concentrated manner within a short period. Therefore, we can identify the publication time periods of the retrieved information within clusters to obtain publication feature identification results.

[0112] The feature recognition results can be published through It means that, among them, This indicates the concentration of retrieved information within a cluster over a given publication period; it can be expressed as the standard deviation between the publication periods of multiple retrieved information items. It is a very small value greater than 0, used to prevent the denominator from being 0. The larger the published feature recognition result, the more concentrated the publication period of the retrieved information in the cluster.

[0113] Furthermore, for the same topic, content and discussions posted by normal users and media outlets typically exhibit different focuses, content structures, and other expressive differences. Content generated through methods like spamming and marketing, however, is usually produced in batches, resulting in higher content similarity. Therefore, thematic information within clustered information can be identified to obtain thematic diversity identification results.

[0114] Thematic diversity identification results can be obtained through This can be represented as the average cosine similarity between the topic information of multiple retrieved items. The greater the topic diversity identification result, the more similar the topic information of the retrieved items in the cluster.

[0115] After obtaining the publication feature identification results and the topic diversity identification results, the publication feature identification results and the topic diversity identification results can be weighted and summed to obtain the information feature identification results.

[0116] The information feature recognition result can be compared with the information feature threshold. If the information feature recognition result is greater than the information feature threshold, it is determined that the information feature recognition result indicates that there is an anomaly in the cluster.

[0117] According to embodiments of this disclosure, by combining multiple features of maliciously generated information, the information features of retrieved information in clusters are identified, and the information features of retrieved information are evaluated from multiple aspects, thereby improving the identification accuracy of clusters.

[0118] Figure 4 A data flow diagram illustrating the determination of identification results according to an embodiment of the present disclosure is shown schematically.

[0119] like Figure 4 As shown, for cluster 401, the source information 402, information features 403, and quantity 404 of the retrieved information in cluster 401 are determined. The information features 403 include the publication period 4031 and the topic information 4032 of the retrieved information.

[0120] To perform diversity identification on the source information 402, the source information 402 is first classified into categories to obtain source category results 405. Based on the source category results 405 and the number of retrieved information 404, the first source identification result 406 of the source information 402 is calculated.

[0121] Professional identification is performed on the source information 402 to obtain the second source identification result 407.

[0122] Based on the first source identification result 406 and the second source identification result 407, the source identification result 408 is obtained.

[0123] If the source identification result 408 indicates that cluster 401 is abnormal, the publication period 4031 is identified to obtain the publication feature identification result 409, and the topic information 4032 is identified to obtain the topic diversity identification result 410.

[0124] Based on the published feature recognition result 409 and the topic diversity recognition result 410, the information feature recognition result 411 is determined.

[0125] Based on the source identification result 408 and the information feature identification result 411, the identification result 412 is determined.

[0126] In one example, after obtaining the source identification result and the information feature identification result, the identification result can be determined based on the first source identification result and the information feature identification result. As shown in equation (2):

[0127] (2)

[0128] Where α, β, and γ are adjustable weighting coefficients. If the value of the identification result exceeds a preset identification threshold, it can be determined that the identification result indicates an abnormal cluster.

[0129] The identification is determined by combining the information feature identification result and the first information source identification result. This allows for a more reliable identification result by combining the concentration of information source information with the identification of information features.

[0130] Figure 5A A schematic diagram illustrating clustering results according to an embodiment of the present disclosure is shown.

[0131] like Figure 5A As shown, based on the semantic information of each of the multiple search results, clustering is performed on the multiple search results. The clustering results include a first cluster 510, a second cluster 520, a third cluster 530, a first outlier 540, a second outlier 550, a third outlier 560, and a fourth outlier 570. The outliers have low similarity to the existing clusters, making it difficult to assign them to any single cluster.

[0132] Although the semantic information corresponding to the outliers mentioned above appears in isolation, and there are no other semantic information with high similarity to the semantic information corresponding to the outliers among the multiple semantic information, for some retrieval information obtained from sources with high professional scores, the timeliness may lead to insufficient dissemination of the retrieval information, and thus its semantic information appears independently as an outlier.

[0133] Therefore, regarding the aforementioned outliers, when performing actions such as... Figure 2 Prior to the operation S240 shown, the interaction method further includes: in the case that there are outlier search information that has not been clustered into a cluster among multiple search information, determining the target search information from the outlier search information based on the source information of the outlier search information.

[0134] For each outlier, the semantic information corresponding to the outlier can be determined, and the retrieval information corresponding to the semantic information is identified as the outlier retrieval information. For the outlier retrieval information, the source information of the outlier retrieval information is obtained.

[0135] If the professionalism score of the source information is determined to be higher than a preset threshold, outlier search information can be identified as target search information.

[0136] For example, outliers from the source information belonging to the first or second level can be used as target search information. Outliers from the source information that do not belong to either the first or second level can be used as noise samples.

[0137] According to embodiments of this disclosure, outlier search information is selected by leveraging the source information of outlier search information. When the source of the outlier search information is sufficiently credible and professional, omissions in the search information are avoided, improving the comprehensiveness and quality of the target search information. This, in turn, enhances the accuracy and comprehensiveness of feedback information during subsequent interactions.

[0138] According to embodiments of this disclosure, based on the identification results, target retrieval information for reference is determined from the clusters, including: when the identification results indicate that the clusters are normal, retrieval information whose semantic vectors are located within the central range of the clusters is determined from the clusters as target retrieval information, wherein the central range indicates that the similarity between the semantic vector and the cluster center is greater than a first similarity threshold, and the cluster center is determined based on the semantic vectors of each retrieval information in the clusters.

[0139] If the identification results indicate that the clusters are normal, the retrieval information corresponding to the cluster centers can be used as a reference to generate feedback results.

[0140] Since the cluster center can represent the core semantics of the cluster, under the condition that the cluster is normal, the cluster center can be considered to be the content that is related to and reliable to the input information. Therefore, the retrieval information whose semantic vector is located in the center range of the cluster can be determined as the target retrieval information.

[0141] The similarity between the semantic vectors at the edge of the central range and the cluster center is the first similarity threshold. Therefore, the similarity between the semantic vectors at the central range and the cluster center is greater than the first similarity threshold.

[0142] According to embodiments of this disclosure, by identifying retrieval information from normal clusters that has a similarity greater than a first similarity threshold to the cluster center as target retrieval information, it is ensured that the semantic vector of the target retrieval information has a high similarity to the semantic vector of the cluster center, thereby ensuring that the target retrieval information has strong representativeness to the cluster and strong correlation with the input information, thus ensuring the effectiveness and accuracy of the feedback results.

[0143] According to embodiments of this disclosure, the interaction method further includes: when there are multiple retrieval information whose semantic vectors are located at the center of a cluster, determining a number of target retrieval information that match the identification result from the multiple retrieval information based on the first evaluation results of each of the multiple retrieval information, wherein the first evaluation result is determined based on at least one of the following: the similarity between the retrieval information and the input information, and the source information of the retrieval information.

[0144] Since the identification results indicate that the clusters are normal, the number of matches with the identification results can be set to be relatively large, so as to increase the proportion of retrieval information from normal clusters in the final selection of multiple target retrieval information, and increase the impact of retrieval information from normal clusters on the feedback results.

[0145] The similarity between retrieved and input information can be determined through semantic similarity. Source information for the retrieved information can be a professional score of that source information. Furthermore, the initial evaluation result can also be determined based on the distance between the retrieved information and the cluster center.

[0146] For example, retrieving information First assessment results It can be determined by equation (3):

[0147] (3)

[0148] in, Indicates search information Semantic similarity between the input information and the input information Indicates search information The professional rating corresponding to the source information. Indicates search information With cluster center The distance between them All of these represent adjustable weighting coefficients.

[0149] Figure 5B This illustration shows a schematic diagram of determining target retrieval information based on normal clustering according to an embodiment of the present disclosure.

[0150] like Figure 5B As shown, with Figure 5A Taking the identification result of the first cluster 510 in the middle as an example, which indicates that it is normal, the center range 512 can be determined according to the vector distance between the first cluster center 511 of the first cluster 510 and the first similarity threshold.

[0151] The first cluster 510 includes multiple semantic vectors of retrieval information. Among them, there are 6 semantic vectors located in the central range 512, namely the first cluster center 511, the first semantic vector 513, the second semantic vector 514, the third semantic vector 515, the fourth semantic vector 516, and the fifth semantic vector 517.

[0152] Taking a matching number of 3 with the recognition result as an example, the first evaluation results of the retrieval information that are located in the center range of the cluster can be calculated respectively, and the 3 retrieval information with the highest first evaluation result scores can be selected from them, namely the retrieval information corresponding to the first semantic vector 513, the retrieval information corresponding to the third semantic vector 515, and the retrieval information corresponding to the sixth semantic vector 518, and the above retrieval information can be used as the target retrieval information.

[0153] According to embodiments of this disclosure, a first evaluation result is determined based on the similarity between the retrieved information and the input information, as well as the source information of the retrieved information. This ensures the comprehensiveness of the first evaluation result and avoids bias caused by a single evaluation criterion. The retrieved information is then filtered based on the first evaluation result, ensuring that the filtered target retrieved information possesses both high relevance to the input information and high credibility. Furthermore, selecting a number of target retrieved information pieces that match the identification results allows for the selection of a larger number of target retrieved information pieces when the clustering is normal, thereby increasing the impact of retrieved information in normal clusters on the feedback results.

[0154] According to embodiments of this disclosure, based on the identification results, target retrieval information for reference is determined from the clusters, including: when the identification results indicate that the clusters are abnormal, retrieval information in which the semantic vector is within the boundary range of the clusters is determined as target retrieval information, wherein the boundary range indicates that the similarity between the semantic vector and the cluster center of the cluster is greater than a second similarity threshold and less than or equal to a first similarity threshold.

[0155] If the identification results indicate that the clusters are normal, the retrieval information corresponding to the cluster centers should not be used as a reference, and a feedback result should be generated.

[0156] In cases of cluster anomalies, the cluster centers can be considered to represent unreliable content in the input information. Therefore, search information located within the cluster boundaries can be selected as target search information to reduce the similarity between the target search information and the cluster centers, thereby lowering the risk of selecting unreliable search information.

[0157] The outer edge of the boundary range is the edge of the cluster. The similarity between the semantic vector at the edge of the cluster and the cluster center is the second similarity threshold. The similarity between the semantic vector at the inner edge of the boundary range and the cluster center is the first similarity threshold. Therefore, the similarity between the semantic vector at the boundary range and the cluster center is greater than the second similarity threshold and less than or equal to the first similarity threshold.

[0158] Since the similarity between the semantic vectors in the clusters and the cluster centers is greater than or equal to the second similarity threshold, in actual selection, it is only necessary to ensure that the retrieval information corresponding to the semantic vectors with a similarity to the cluster centers less than or equal to the first similarity threshold is selected as the target retrieval information.

[0159] According to embodiments of this disclosure, by selecting retrieval information from abnormal clusters that has a similarity to the cluster center greater than a second similarity threshold and less than or equal to a first similarity threshold as target retrieval information, it is ensured that the semantic vector of the target retrieval information has a low similarity to the semantic vector of the cluster center. This avoids the target retrieval information from being too close to untrusted semantic vectors, which could affect the feedback results, thereby further ensuring the effectiveness and accuracy of the feedback results.

[0160] According to embodiments of this disclosure, the interaction method further includes: when there are multiple retrieval information whose semantic vectors are within the boundary range of a cluster, determining a number of target retrieval information that match the identification result from the multiple retrieval information based on the second evaluation results of each of the multiple retrieval information, wherein the second evaluation results are determined based on the source information of the retrieval information.

[0161] Since the identification results indicate cluster anomalies, the number of matches with the identification results can be set relatively small. This reduces the proportion of retrieval information from anomalous clusters in the final selected target retrieval information, thus minimizing the impact of retrieval information from anomalous clusters on the feedback results. For example, the number of matches with the identification results can be set to 1 or 0.

[0162] By setting the number of matches with the identification results to 0, abnormal clusters are discarded, thus completely preventing the retrieval information from contaminating the target retrieval information.

[0163] By setting the number of matches with the identification results to 1, relatively marginalized retrieval information is selected from the abnormal clusters, which improves the richness of the target retrieval information and reduces the probability of introducing noisy data.

[0164] In one example, the second evaluation result can be determined based on the source information of the retrieved information. For example, if there is a retrieved information whose source is at the first level among multiple retrieved information within the boundary range of the abnormal cluster, the retrieved information can be used as the target retrieved information.

[0165] In another example, retrieving information Second assessment results The calculation method can be similar to that of the first evaluation result, as shown in equation (4):

[0166] (4)

[0167] in, Indicates search information Semantic similarity between the input information and the input information Indicates search information The professional rating corresponding to the source information. Indicates search information With cluster center The distance between them All of these represent adjustable weighting coefficients.

[0168] Figure 5C This illustration shows a schematic diagram of determining target retrieval information based on abnormal clustering according to an embodiment of the present disclosure.

[0169] like Figure 5C As shown, with Figure 5A Taking the identification result of the third cluster 530 in the middle as an example, the boundary range 532 can be determined based on the third cluster center 531 of the third cluster 530, the vector distance corresponding to the first similarity threshold, and the vector distance corresponding to the second similarity threshold.

[0170] The third cluster 530 includes multiple semantic vectors of retrieval information. Among them, there are three semantic vectors located in the boundary range 532, namely the sixth semantic vector 533, the seventh semantic vector 534, and the eighth semantic vector 535.

[0171] Taking a match of 1 with the recognition result as an example, the second evaluation results of the retrieval information of multiple semantic vectors within the boundary range of the cluster can be calculated respectively, and the retrieval information with the highest second evaluation result score, namely the retrieval information corresponding to the eighth semantic vector 534, can be selected as the target retrieval information.

[0172] According to embodiments of this disclosure, a second evaluation result is determined based on the similarity between the retrieved information and the input information, as well as the source information of the retrieved information. This ensures the comprehensiveness of the second evaluation result and avoids bias caused by a single evaluation criterion. Based on the second evaluation result, the retrieved information in abnormal clusters is filtered to obtain target retrieved information with a significant semantic difference from the target retrieved information already present in normal clusters, thus improving the richness of the target retrieved information. Furthermore, by selecting a number of target retrieved information pieces that match the identification results, a smaller number of target retrieved information pieces can be selected in cases of abnormal clusters, or the retrieved information in abnormal clusters can be directly ignored, thereby reducing the impact of the retrieved information in abnormal clusters on the feedback results.

[0173] According to embodiments of this disclosure, generating feedback results for input information based on target retrieval information includes: inputting the target retrieval information and input information into a large model to generate feedback results.

[0174] Figure 6 The diagram illustrates the interactive interface and interaction process according to embodiments of the present disclosure.

[0175] like Figure 6 As shown, after receiving input information via the interactive interface, the input information is fed into the large model, which then retrieves the input information to obtain multiple retrieval information. The model then performs clustering, identification, and filtering operations on the retrieval information to determine the target retrieval information from the multiple retrieval information.

[0176] The target retrieval information is input into the large model. Utilizing the natural language processing capabilities of the large model, based on the target retrieval information and the input information, content related to the input information is extracted from the target retrieval information, organized, and feedback results are generated.

[0177] Once the feedback results are generated, they can be output via the interactive interface.

[0178] According to embodiments of this disclosure, by processing target retrieval information and input information using a large model, further filtering can be performed on the target retrieval information to obtain information relevant to the input information, thereby generating feedback results that are clearly used to respond to the input information. This improves the relevance between the feedback results and the input information and the overall feedback quality, thus optimizing the user's interactive experience.

[0179] In one example, the input information entered through the interactive interface is "How is xx face cream?". After retrieving the input information and obtaining multiple search results, the search results are filtered based on the similarity between the search results and the input information, and the 40 search results with the highest similarity are retained.

[0180] Based on the semantic information of the 40 retained search results, the search results were clustered, resulting in three clusters and one outlier search result. The three clusters are as follows: , and , It includes 30 search results, with the cluster center representing the semantic information that the face cream has a very good whitening effect and is recommended for purchase. The search results include nine pieces of information, with the semantic information represented by the cluster center indicating that the face cream contains hormones and is likely to cause allergies. It includes three search results, with the semantic information represented by the cluster center being the query result for the face cream's registration with the drug regulatory authority.

[0181] For clusters To identify and determine The retrieval information in the data all comes from the fourth level, and the first source identification result is calculated according to equation (1). The values ​​are low, the content is highly homogeneous, and the publication time is relatively concentrated, thus the recognition result is calculated. The score was high, exceeding the preset recognition threshold. Therefore, the search information within this cluster is highly likely to have been generated through methods such as fake reviews and marketing.

[0182] For clusters To identify and determine The source information of the retrieved information is at the second, third, and fourth levels, and the sources are relatively dispersed. The first source identification result is calculated according to formula (1). The value is relatively high.

[0183] For clusters To identify and determine The retrieval information in the data all comes from the first level, and the first source identification result is calculated according to equation (1). The value is relatively high.

[0184] Outlier search information is identified and determined to originate from the fourth level; therefore, this outlier search information is not considered as the target search information.

[0185] Based on the above identification results, If a cluster is identified as an anomalous cluster, only one piece of retrieval information will be retained as the target retrieval information. and It was determined to be a normal cluster. and All search information was retained as target search information, resulting in a total of 10 target search results. Compared to the initial 40 search results, the amount of search information to be processed was reduced by 75%.

[0186] After determining the target retrieval information, the target retrieval information and input information are fed into the large model. Based on the input information, the large model rewrites the semantics, including effect recommendations, potential problems, and authoritative registration information. The generated feedback result is: "XX face cream has a good whitening effect, but this product has allergy controversies. Furthermore, its official registration information shows..."

[0187] The feedback results provided above offer users a comprehensive and accurate reference, reducing the time and effort required for users to discern the feedback and optimizing their interactive experience.

[0188] Figure 7 A block diagram of an interactive device according to an embodiment of the present disclosure is shown schematically.

[0189] like Figure 7 As shown, the interactive device in this embodiment includes a retrieval module 710, a clustering module 720, an identification module 730, a determination module 740, and a generation module 750.

[0190] The retrieval module 710 is used to retrieve input information entered through the interactive interface and obtain multiple retrieval results.

[0191] The clustering module 720 is used to cluster multiple search information based on their respective semantic information to obtain clusters.

[0192] The identification module 730 is used to identify clusters and determine the identification results indicating whether the clusters are abnormal.

[0193] The determination module 740 is used to determine target retrieval information for reference from the clusters based on the recognition results.

[0194] The generation module 750 is used to generate feedback results for the input information based on the target retrieval information.

[0195] According to embodiments of this disclosure, the identification module 730 includes a first identification submodule, a second identification submodule, and an identification submodule.

[0196] The first identification submodule is used to identify the source information of the retrieved information in the cluster and obtain the source identification result. The source information indicates the source of the retrieved information.

[0197] The second identification submodule is used to identify the information features of the retrieved information in the cluster when the source identification result indicates that there is an anomaly in the cluster, and obtain the information feature identification result. The information features indicate the attributes of the retrieved information.

[0198] The identification submodule is used to obtain the identification result based on the source identification result and the information feature identification result.

[0199] According to embodiments of this disclosure, the first identification submodule includes a first identification unit, a second identification unit, and a source identification unit.

[0200] The first identification unit is used to identify the diversity of information source information in the cluster and obtain the first information source identification result.

[0201] The second identification unit is used to identify the professionalism of the source information of the retrieved information in the cluster, and obtain the second source identification result.

[0202] The source identification unit is used to obtain the source identification result based on the first source identification result and the second source identification result.

[0203] According to embodiments of this disclosure, the first identification unit includes a dividing subunit and an identification subunit.

[0204] The sub-units are used to classify the source information of each retrieved information in the cluster into categories, and obtain the source category results.

[0205] The identification subunit is used to determine the first source identification result based on the source category result and the amount of retrieval information in the cluster.

[0206] According to embodiments of this disclosure, the second identification submodule includes a time period identification unit, a topic identification unit, and a topic identification unit.

[0207] The time period identification unit is used to identify the publication time period of the search information in the cluster and obtain the publication feature identification result.

[0208] The topic identification unit is used to identify the topic information of the retrieved information in the cluster and obtain the topic diversity identification result.

[0209] The topic identification unit is used to obtain information feature identification results based on the publication feature identification results and topic diversity identification results.

[0210] According to embodiments of this disclosure, the determining module 740 includes a first determining submodule.

[0211] The first determining submodule is used to determine the retrieval information whose semantic vector is located within the central range of the cluster when the recognition result indicates that the cluster is normal, and to use it as the target retrieval information. Here, the central range means that the similarity between the semantic vector and the cluster center is greater than a first similarity threshold. The cluster center is determined based on the semantic vector of each retrieval information in the cluster.

[0212] According to embodiments of this disclosure, the interactive device 700 further includes a first information determination module.

[0213] The first information determination module is used to determine a number of target retrieval information that match the recognition result from the multiple retrieval information when the semantic vector is located in the center range of the cluster and there are multiple retrieval information. The first evaluation result is determined based on at least one of the following: the similarity between the retrieval information and the input information, and the source information of the retrieval information.

[0214] According to embodiments of this disclosure, the determining module 740 includes a second determining submodule.

[0215] The second determining submodule is used to determine the retrieval information of semantic vectors within the boundary range of clusters when the identification result indicates that the clusters are abnormal, and use this information as the target retrieval information. The boundary range means that the similarity between the semantic vector and the cluster center is greater than the second similarity threshold and less than or equal to the first similarity threshold.

[0216] According to embodiments of this disclosure, the interactive device 700 further includes a second information determination module.

[0217] The second information determination module is used to determine a number of target retrieval information that match the recognition result from the multiple retrieval information when the semantic vector is within the boundary range of the cluster. This is based on the second evaluation results of the multiple retrieval information. The second evaluation results are determined based on the source information of the retrieval information.

[0218] According to embodiments of this disclosure, the interactive device 700 further includes a third information determination module.

[0219] The third information determination module is used to determine the target retrieval information from the outlier retrieval information when there are outlier retrieval information that has not been clustered into a cluster among multiple retrieval information. This is based on the source information of the outlier retrieval information.

[0220] According to embodiments of this disclosure, the retrieval module 710 includes a first information determination submodule and a retrieval submodule.

[0221] The first information determination submodule is used to determine the source information based on the intent recognition result of the input information, wherein the source information indicates the source of the retrieved information.

[0222] The retrieval submodule is used to determine multiple retrieval information that match the input information based on the source information.

[0223] According to embodiments of this disclosure, the first information determination submodule includes a first information determination unit.

[0224] The first information determination unit is used to determine the source information based on the intent recognition result and the source mapping relationship when the intent recognition result indicates that retrieval is required. The source mapping relationship represents the correspondence between the intent recognition result and the source.

[0225] According to embodiments of this disclosure, clustering module 720 includes a similarity determination submodule and a clustering submodule.

[0226] The similarity determination submodule is used to determine the semantic similarity between multiple search results based on their respective semantic information, thus obtaining multiple semantic similarities.

[0227] The clustering submodule is used to cluster multiple retrieved information based on multiple semantic similarities to obtain clusters.

[0228] According to embodiments of this disclosure, generation module 750 includes generation submodule.

[0229] The generation submodule is used to input the target retrieval information and input information into the large model and generate feedback results.

[0230] Figure 8 A schematic block diagram of a smart agent according to an embodiment of the present disclosure is shown.

[0231] In embodiments of this disclosure, such as Figure 8 As shown, the intelligent agent 800 may include an input module 810, a processing module 820, and an output module 830.

[0232] Input module 810 is used to receive input information.

[0233] The processing module 820 is used to determine the target task based on the input information received by the input module, determine the large model based on the target task, and obtain output information by calling the large model to execute the interaction method provided according to the embodiments of this disclosure.

[0234] Output module 830 is used to output the output information obtained by the processing module.

[0235] According to embodiments of this disclosure, the input module 810 is responsible for receiving or sensing information such as queries, requests, instructions, signals, or data from the outside world (e.g., users or the external environment), and converting it into a format that the intelligent agent 800 can understand and process. The input module 810 is the primary link for the intelligent agent 800 to interact with the outside world, enabling the intelligent agent 800 to efficiently and accurately obtain necessary "sensory" information from the outside world and respond to this information.

[0236] In the example, the input module 810 can input the input information and target retrieval information described above.

[0237] In the example, the processing module 820 is the core support for the agent 800's ability to handle complex tasks. The processing module 820 can execute the sample generation method, model training method, and code processing method described above.

[0238] In the example, the performance of processing module 820 is closely related to the large model on which agent 800 is based. To fully leverage the capabilities of the large model, the internal structure of processing module 820 can be designed to be highly configurable and scalable to handle various types of tasks and requirements in real-world scenarios.

[0239] Understandably, while large models possess excellent language understanding and generation capabilities, like humans, their ability to solve tasks is limited without the aid of any tools. When Agent 800 is given the ability to invoke tools, it can perform tasks such as using a calculator to perform mathematical calculations, using Python to perform data analysis, and using a search engine to create weather forecasts.

[0240] In the example, the output module 830 can output the search information, feedback results, etc. described above.

[0241] The intelligent agent 800 according to the embodiments of this disclosure can simply and effectively improve the level of intelligence, and enhance flexibility and versatility.

[0242] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0243] According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method described above.

[0244] According to embodiments of the present disclosure, a non-transitory computer-readable storage medium stores computer instructions, wherein the computer instructions are used to cause a computer to perform the method described above.

[0245] According to an embodiment of this disclosure, a computer program product includes a computer program that, when executed by a processor, implements the method described above.

[0246] Figure 9A block diagram schematically illustrates an electronic device suitable for implementing an interactive method according to embodiments of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0247] like Figure 9 As shown, device 900 includes a computing unit 901, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 902 or a computer program loaded from storage unit 908 into random access memory (RAM) 903. RAM 903 may also store various programs and data required for the operation of device 900. The computing unit 901, ROM 902, and RAM 903 are interconnected via bus 904. Input / output (I / O) interface 905 is also connected to bus 904.

[0248] Multiple components in device 900 are connected to input / output (I / O) interface 905, including: input unit 906, such as keyboard, mouse, etc.; output unit 907, such as various types of monitors, speakers, etc.; storage unit 908, such as disk, optical disk, etc.; and communication unit 909, such as network card, modem, wireless transceiver, etc. Communication unit 909 allows device 900 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0249] The computing unit 901 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the various methods and processes described above, such as interactive methods. For example, in some embodiments, the interactive method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and / or installed on device 900 via ROM 902 and / or communication unit 909. When the computer program is loaded into RAM 903 and executed by the computing unit 901, one or more steps of the interactive method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform interactive methods by any other suitable means (e.g., by means of firmware).

[0250] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0251] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0252] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0253] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0254] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0255] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, distributed system servers, or servers incorporating blockchain technology.

[0256] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0257] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. An interaction method, comprising: The system retrieves multiple search results from the input information provided through the interactive interface. Based on the semantic information of each of the multiple search information items, the multiple search information items are clustered to obtain clusters; The clusters are identified, and an identification result indicating whether the clusters are abnormal is determined; Based on the identification results, target retrieval information for reference is determined from the clusters; as well as Based on the target retrieval information, a feedback result is generated to provide feedback on the input information.

2. The method according to claim 1, wherein, The step of identifying the clusters and determining the identification result indicating whether the clusters are abnormal includes: The source information of the retrieved information in the cluster is identified to obtain the source identification result, wherein the source information indicates the source of the retrieved information; When the source identification result indicates that the cluster is abnormal, the information features of the retrieved information in the cluster are identified to obtain information feature identification results, wherein the information features indicate the attributes of the retrieved information; and The identification result is obtained based on the source identification result and the information feature identification result.

3. The method according to claim 2, wherein, The process of identifying the source information of the retrieved information in the cluster to obtain the source identification result includes: The diversity of the source information of the retrieved information in the cluster is identified to obtain the first source identification result; The professionalism of the source information of the retrieved information in the cluster is identified to obtain a second source identification result; and The source identification result is obtained based on the first source identification result and the second source identification result.

4. The method according to claim 3, wherein, The step of identifying the diversity of source information of the retrieved information in the cluster to obtain a first source identification result includes: The source information of each retrieved information in the cluster is classified into categories to obtain source category results; and The first source identification result is determined based on the source category result and the number of retrieval information in the cluster.

5. The method according to any one of claims 2 to 4, wherein, The process of identifying the information features of the retrieved information in the clusters to obtain information feature identification results includes: The publication time period of the retrieved information in the cluster is identified to obtain the publication feature identification result; The topic information of the retrieved information in the clusters is identified to obtain topic diversity identification results; and Based on the published feature recognition results and the topic diversity recognition results, the information feature recognition results are obtained.

6. The method according to any one of claims 1 to 5, wherein, The step of determining target retrieval information for reference from the clusters based on the identification results includes: When the identification result indicates that the cluster is normal, the retrieval information whose semantic vector is located in the central range of the cluster is determined from the cluster and used as the target retrieval information. The central range indicates that the similarity between the semantic vector and the cluster center of the cluster is greater than a first similarity threshold. The cluster center is determined based on the semantic vector of each retrieval information in the cluster.

7. The method according to claim 6, further comprising: When there are multiple retrieval information entries whose semantic vectors are located at the center of the cluster, based on the first evaluation results of each of the multiple retrieval information entries, a number of target retrieval information entries that match the identification result are determined from the multiple retrieval information entries. The first evaluation result is determined based on at least one of the following: the similarity between the retrieved information and the input information, and the source information of the retrieved information.

8. The method according to any one of claims 1 to 7, wherein, The step of determining target retrieval information for reference from the clusters based on the identification results includes: When the identification result indicates that the cluster is abnormal, the retrieval information in which the semantic vector is located within the boundary range of the cluster is determined from the cluster and used as the target retrieval information, wherein the boundary range indicates that the similarity between the semantic vector and the cluster center of the cluster is greater than a second similarity threshold and less than or equal to the first similarity threshold.

9. The method according to claim 8, further comprising: When there are multiple retrieval information entries whose semantic vectors are within the boundary range of the cluster, based on the second evaluation results of each of the multiple retrieval information entries, a number of target retrieval information entries that match the identification result are determined from the multiple retrieval information entries. The second evaluation result is determined based on the source information of the retrieved information.

10. The method according to any one of claims 1 to 9, further comprising: In the case where there are outliers in the multiple search results that are not clustered into a cluster, the target search information is determined from the outliers based on the source information of the outliers.

11. The method according to any one of claims 1 to 10, wherein, The process of retrieving input information via the interactive interface yields multiple retrieval results, including: Based on the intent recognition result of the input information, source information is determined, wherein the source information indicates the source of the retrieved information; and Based on the source information, a plurality of search information matching the input information are determined.

12. The method according to claim 11, wherein, The determination of source information based on the intent recognition result of the input information includes: When the intent recognition result indicates that a retrieval is required, the source information is determined based on the intent recognition result and the source mapping relationship, wherein the source mapping relationship represents the correspondence between the intent recognition result and the source.

13. The method according to any one of claims 1 to 12, wherein, The step of clustering the multiple search information based on their respective semantic information to obtain clusters includes: Based on the semantic information of each of the multiple search pieces of information, the semantic similarity between the multiple search pieces of information is determined, resulting in multiple semantic similarities; and Based on multiple semantic similarities, multiple retrieval information are clustered to obtain the clusters.

14. The method according to any one of claims 1 to 13, wherein, Based on the target retrieval information, a feedback result is generated to provide feedback on the input information, including: The target retrieval information and the input information are input into the large model to generate the feedback result.

15. An interactive device, comprising: The retrieval module is used to retrieve input information entered through the interactive interface and obtain multiple retrieval results; The clustering module is used to cluster the multiple search information based on their respective semantic information to obtain clusters; The identification module is used to identify the clusters and determine the identification result indicating whether the clusters are abnormal. The determining module is used to determine target retrieval information for reference from the clusters based on the identification results; as well as The generation module is used to generate feedback results based on the target retrieval information to provide feedback on the input information.

16. An intelligent agent, comprising: The input module is used to receive input information; The processing module is configured to determine a target task based on the input information received by the input module, determine a large model based on the target task, and obtain output information by calling the large model to execute the method of any one of claims 1 to 14. An output module is used to output the output information obtained by the processing module.

17. An electronic device comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 14.

18. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1 to 14.

19. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 14.