Reply data generation method and device for dialogue window

By calculating the vitality score of user dialogue data, target data sets are selected to generate response data, which solves the problem of inaccurate response data in intelligent agent interaction and achieves higher accuracy and relevance.

CN122240788APending Publication Date: 2026-06-19FIBOCOM WIRELESS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FIBOCOM WIRELESS
Filing Date
2026-03-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the response data generated by intelligent agents when interacting with users through dialogue windows is inaccurate.

Method used

By acquiring user dialogue data, the system queries the database for the first matching dataset based on the dialogue data, calculates the vitality score for each dataset, and selects the target dataset based on confidence, similarity, number of searches, and sentiment value to generate response data.

Benefits of technology

This improves the accuracy of generated response data and ensures that the importance of the dataset is relevant to user interactions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240788A_ABST
    Figure CN122240788A_ABST
Patent Text Reader

Abstract

This application relates to a method and apparatus for generating response data for a dialogue window. The method includes: acquiring user dialogue data; querying a database for a first data set matching the dialogue data, wherein the database includes multiple data sets; acquiring a vitality score for each of the first data sets, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of times the first data set is retrieved, and the sentiment value of the first data set, and is used to represent the importance of the first data set; determining a target data set from the first data sets according to the vitality score; and generating response data for the dialogue data based on the target data set. This application solves the technical problem of inaccurate generated response data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of user interaction, and in particular to a method and apparatus for generating response data for a dialog window. Background Technology

[0002] With the rapid development of large language model (LLM) technology, intelligent agents have demonstrated outstanding performance in areas such as dialogue generation and task assistance. For example, they can interact with users through dialogue windows to engage in conversations or assist users in handling tasks.

[0003] However, in existing technologies, when a user interacts with an agent through a dialogue window, the agent typically retrieves relevant information from a database based on the user's dialogue data and then generates response data based on that information. This method can only generate response data based on information in the database that is related to the dialogue data, resulting in inaccurate response data. Summary of the Invention

[0004] This application provides a method and apparatus for generating response data for a dialog window to solve the technical problem of inaccurate generated response data.

[0005] In a first aspect, this application provides a method for generating response data for a dialogue window, comprising: acquiring user dialogue data; querying a database for a first data set matching the dialogue data, wherein the database includes multiple data sets; acquiring a vitality score for each first data set, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set; determining a target data set from the first data set according to the vitality score; and generating response data for the dialogue data based on the target data set.

[0006] Secondly, this application provides a device for generating response data for a dialogue window, comprising: an acquisition module for acquiring user dialogue data; a query module for querying a first data set matching the dialogue data from a database, wherein the database includes multiple data sets; a scoring module for acquiring a vitality score for each first data set, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of times the first data set is retrieved, and the sentiment value of the first data set, and is used to represent the importance of the first data set; a determination module for determining a target data set from the first data set according to the vitality score; and a generation module for generating response data for the dialogue data based on the target data set.

[0007] Compared with the prior art, the technical solution provided in this application has the following advantages: The solution provided in this application obtains user dialogue data; based on the dialogue data, queries a first data set matching the dialogue data from a database, wherein the database includes multiple data sets; obtains a vitality score for each first data set, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set; determines a target data set from the first data set according to the vitality score; and generates response data for the dialogue data based on the target data set. Thus, after finding a matching first data set based on the dialogue data, the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set to obtain the importance of the first data set. Furthermore, the target data set is selected based on the vitality score to generate response data, thereby improving the accuracy of the generated response data. Attached Figure Description

[0008] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.

[0009] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0010] One or more embodiments are illustrated by way of example with reference numerals in the accompanying drawings. These illustrations do not constitute a limitation on the embodiments. Elements with the same reference numerals in the drawings are denoted as similar elements. Unless otherwise stated, the figures in the drawings are not to be limited by scale.

[0011] Figure 1 A flowchart illustrating a method for generating response data for a dialog window, as provided in an embodiment of this application; Figure 2 A flowchart illustrating yet another method for generating response data for a dialog window, as provided in an embodiment of this application; Figure 3 A flowchart illustrating yet another method for generating response data for a dialog window, as provided in an embodiment of this application; Figure 4 A flowchart illustrating yet another method for generating response data for a dialog window, as provided in an embodiment of this application; Figure 5 A schematic diagram of a dialog window response data generation device provided in an embodiment of this application; Figure 6 This is a schematic diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0012] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0013] The following disclosure provides numerous different embodiments or examples for implementing various structures of the invention. To simplify the disclosure, specific examples of components and arrangements are described below. These are merely examples and are not intended to limit the scope of the invention. Furthermore, reference numerals and / or letters may be repeated in different examples. Such repetition is for simplification and clarity and does not in itself indicate a relationship between the various embodiments and / or arrangements discussed.

[0014] To address the technical problem of inaccurate response data generated in existing technologies, this application provides a method for generating response data for a dialog window, which can improve the accuracy of response data in the dialog window.

[0015] Figure 1 This is a flowchart illustrating a method for generating response data for a dialog window, as provided in an embodiment of this application. Figure 1 As shown, the method for generating response data for the above dialog window includes: S102, Obtain user's conversation data; S104, Based on the dialogue data, query the database for a first data set that matches the dialogue data, wherein the database includes multiple data sets; S106, obtain the vitality score for each first data set, wherein the vitality score is calculated based on the confidence of the first data set, the similarity between the first data set and the active data set, the number of retrievals of the first data set and the sentiment value of the first data set, and is used to represent the importance of the first data set; S108, Based on the vitality score, determine the target data set from the first data set; S110, Generate response data for the dialogue data based on the target dataset.

[0016] This application can be applied to the process of users interacting with intelligent agents through a dialogue window. The dialogue window is the interface for interaction between the user and the intelligent agent, which can be displayed on the client. Users can input dialogue data through the dialogue window and view response data through the dialogue window.

[0017] The type of intelligent agent in this application is not limited. The intelligent agent corresponds to a database, which stores a set of data. The dialogue data input by the user will search for the first set of data in the database. After filtering, the target set of data is obtained, and the response data is generated based on the target set of data and fed back to the user.

[0018] There are multiple ways to obtain dialogue data in this application. For example, users can input data by inputting voice, inputting text, recognizing text through screenshots, inputting images, taking screenshots, taking photos, etc. The dialogue window can use the text, voice, images, screenshots, or photos input by the user as dialogue data.

[0019] The database in this application can store data scraped from the web, as well as question-and-answer data, push data, and other data generated during historical conversations between the current user and other users. For private intelligent agents, data is not shared; only data scraped from the web and question-and-answer data generated during personal conversations are stored, and this database only provides services to the current user.

[0020] The database in this application may include multiple data sets, each used to store different types of data. One example is classifying the data and then dividing the classified data into different data sets. Another example is grouping data with high similarity into the same data set.

[0021] The data set in this application may include multiple data sets, which are grouped into one data set because they are of the same type or have high similarity.

[0022] In this application, for user dialogue data, a first data set matching it can be retrieved from the database. The first data set is the set of data in the database that matches the dialogue data.

[0023] In this application, each data set in the database can correspond to a vitality score, which represents the importance of the data set. A higher importance score indicates that the data in that data set is more important and easier to use to generate response data; conversely, a lower importance score indicates that the data in that data set is less important and less likely to be used to generate response data. Data sets with an importance score below a certain level can be stored in a less frequently used database or storage space.

[0024] The vitality score of the first dataset can be calculated either when data units within the dataset change or when the dataset is determined. The calculation of the vitality score involves several aspects, including the confidence score of the first dataset, the similarity score between the first dataset and active datasets, the number of searches performed on the first dataset, and the sentiment score of the first dataset. The confidence score of the first dataset is calculated based on the confidence scores of all data units within it. For example, the confidence score of each data unit in the first dataset is calculated, and then the average of these confidence scores is taken to obtain the overall confidence score of the first dataset. The similarity score between the first dataset and active datasets requires first determining the active dataset. An active dataset is a dataset that has been used multiple times as a target dataset to generate response data; it can be determined based on the number of times it has been used as a target dataset in the past. The similarity score between the first dataset and active datasets is calculated by assessing the similarity of the data unit content between the two datasets, and the degree of similarity. A high similarity score indicates that the content of the current first dataset is also important and easily usable. The number of retrievals for the first data set is the number of times the first data set is used as the target data set to generate response data. Finally, the sentiment value of the first data set represents the sentiment of the first data set. The sentiment can be obtained by the intelligent model to identify the data units in the data set, determine the sentiment of all data units in the first data set, and then select the sentiment that is most abundant and determine the sentiment value of that sentiment as the sentiment value of the data set. Different sentiments correspond to different sentiment values.

[0025] After completing the steps described above to calculate the vitality score of the first dataset, a vitality score for the first dataset is obtained. After removing datasets with low vitality scores, the remaining datasets are used as the target dataset to generate the response data.

[0026] The solution provided in this application involves: acquiring user dialogue data; querying a database for a first data set matching the dialogue data, wherein the database includes multiple data sets; acquiring a vitality score for each first data set, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set; determining a target data set from the first data set according to the vitality score; and generating response data for the dialogue data based on the target data set. Thus, after finding a matching first data set based on the dialogue data, the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set to obtain the importance of the first data set. Furthermore, the target data set is selected based on the vitality score to generate response data, thereby improving the accuracy of the generated response data.

[0027] As an optional example, such as Figure 2 As shown, after obtaining the user's conversation data, the above method also includes: S202, when the dialogue data is text data, convert the dialogue data into a multi-dimensional vector; S204, when the dialogue data is voice data, convert the voice data into text data and then into a multi-dimensional vector; S206, when the dialogue data is image data, the image data is converted into a multi-dimensional vector. If the dimension of the multi-dimensional vector of the image data is inconsistent with the dimension of the multi-dimensional vector of the text data, the multi-dimensional vector of the image data is linearly mapped to be consistent with the dimension of the multi-dimensional vector of the text data.

[0028] In this application, since user dialogue data may be of various types, such as text, voice, and images, the database can also have databases storing different types of data. Then, based on the type of user dialogue data, the corresponding database is searched, and the dialogue data is retrieved from that database. Alternatively, the database can convert different types of data, such as text, voice, and images, into the same format and store them in a single database. In this case, to retrieve the first data set from the database using the dialogue data, the format of the dialogue data can be converted to a unified format, that is, converted to the same format as the data in the database.

[0029] In this application, the text data, speech data, or image data can be converted into multidimensional vectors using a multimodal large model, while maintaining a uniform dimension. This ensures that text, speech, and image data can all be converted into multidimensional vectors of the same dimension. Thus, for the multidimensional vectors obtained from dialogue data, a first data set can be searched in a database through vector comparison. If the dimension of the multidimensional vector of the image data is inconsistent with the dimension of the multidimensional vector of the text data, the multidimensional vector of the image data is linearly mapped to have the same dimension as the multidimensional vector of the text data.

[0030] For example, to convert text data into a 768-dimensional vector, if it's speech data, first perform a speech-to-text operation, then convert the resulting text into a 768-dimensional vector. If it's image data, it can be converted into a 512-dimensional vector, then linearly mapped to a 768-dimensional vector. During the mapping, the 512-dimensional vector can be multiplied by a weight matrix from 512 to 768 dimensions, thus mapping it to a 768-dimensional vector.

[0031] The method described in this application converts user dialogue data into multi-dimensional vectors, allowing for the search of a first data set in the database using vector similarity comparison, thus improving the efficiency of data set retrieval. Furthermore, since comparing vectors reveals the degree of similarity between them, the first data set found is highly relevant to the dialogue data, thereby improving the accuracy of the first data set search.

[0032] As an optional example, such as Figure 3 As shown, after obtaining the user's conversation data, the above method also includes: S302, converts dialogue data into multi-dimensional vectors; S304. The dialogue data and multi-dimensional vector are stored as a single data unit in a data set within the database. This database also includes data units representing historical dialogue data, historical response data, and web search results. The database can contain multiple data sets, each of which can include multiple data units.

[0033] This application employs a scheme that converts text, images, and voice data into multi-dimensional vectors of the same dimension for storage, thereby storing multiple types of data such as text, voice, and images in a single database. The method for converting text, voice, and images in the database into multi-dimensional vectors is the same as the method for converting dialogue data into multi-dimensional vectors, and the converted multi-dimensional vectors have the same dimensions. The database can convert dialogue data from conversations between the current user or other users and the intelligent agent, the intelligent agent's response data, and data crawled from the web into multi-dimensional vectors and store them in the database.

[0034] When storing multidimensional vectors in a database, for a piece of text, audio, or image data, it is converted into a multidimensional vector and then combined with the data information of that data to store it as a data unit. Each data unit in the database is derived from a single piece of data. The data unit stores the data information of the data and the multidimensional vector, which is the vector obtained from the above conversion. The data information may include information such as the data type, the unique identifier of the data, the time point of data generation, the initial confidence level of the data, and the sentiment category of the data.

[0035] In this application, data units are stored using a database. These data units are obtained by concatenating multi-dimensional vectors and data information converted from image, text, and speech data. This unifies the format of the data stored in the database, improving the efficiency of searching the first data set using dialogue data. Furthermore, storing each data unit in the database combines its data information and multi-dimensional vector, thus improving the data storage efficiency of the database.

[0036] As an optional example, storing the dialogue data and multidimensional vectors as a single data unit in a database includes: calculating the matching degree between the data unit and an existing data set using the following formula; storing the data unit in a matching data set when the matching degree is greater than a preset matching degree; and creating a new data set for the data unit when it is not assigned to any data set. (1) in, For matching degree, For data unit i, For data set j, Let i be a multidimensional vector of data unit i. is the average of the multidimensional vectors of all data units in data set j.

[0037] In this application, for data in the database, not only can the data information of each data point and its multidimensional vector be stored as a data unit, but the data units can also be divided, thereby dividing the originally independent data units into different data sets, with similar content in the data units of different data sets. This results in multiple data sets, and by comparing the data sets, the efficiency of finding the first data set can be improved.

[0038] When partitioning a dataset, it's necessary to compare the matching degree between different data units and different datasets. For example, for 100 data units, one data unit can be randomly selected, and this data unit is assigned to a dataset, designated as dataset 1. Then, from the remaining 99 data units, another data unit is randomly selected, and the matching degree between the randomly selected data unit and dataset 1 is calculated using Formula 1. In Formula 1, the correlation between the data unit and dataset 1 is determined by calculating the cosine value of the multidimensional vector of a data unit and the average of the multidimensional vectors of all data units in dataset 1. The obtained cosine value is compared with a preset matching degree. If the cosine value is greater than the preset matching degree, the data unit is also assigned to dataset 1; if the cosine value is less than or equal to the preset matching degree, the data unit is assigned to a separate dataset 2. Then, from the remaining 98 data units out of the 100 data units, data units are randomly selected, and their cosine values ​​are calculated with data set 1 and data set 2 respectively, thereby determining whether the data unit is assigned to data set 1, data set 2, or a separate data set is created.

[0039] After multiple comparisons, the original 100 data units were divided into multiple data sets. The multidimensional vectors of the data units in each data set are similar, while the multidimensional vectors of the data units in different data sets are not similar.

[0040] In this application, the number of data units that each data set can store can also be set, such as setting an upper limit of 100 to prevent the data set from becoming too large.

[0041] For newly added data units to the database, they can be sequentially assigned to different datasets or a new dataset can be created. For data units that can be assigned to multiple datasets, they should be assigned to the dataset with the largest cosine value.

[0042] This application divides data units in the database into data sets, thereby grouping similar multidimensional vector data units into the same data set and separating dissimilar data units into different data sets. As a result, when searching for the first data set in dialogue data, the first data set can be determined directly based on the comparison results between the dialogue data and the data set, thus improving the search efficiency of the first data set.

[0043] As an optional example, obtaining the vitality score for each first dataset includes calculating the vitality score using the following formula: (2) in, Rate vitality The confidence level of the first dataset. The similarity between the first dataset and the active dataset. The number of searches performed on the first data set. The sentiment value of the first dataset. , , , As weight.

[0044] In this application, for each dataset in the database, a vitality score can be determined. This vitality score corresponds to the importance of the dataset in the database. The higher the importance, the greater the likelihood that the dataset will be used to generate response data. Datasets with low vitality scores can be considered to have a very low probability of being used to generate response data. Therefore, datasets with low vitality scores can be removed from the database and stored in a less frequently used database.

[0045] Vitality score can be calculated by adding or removing data units in the dataset, or it can be calculated after the first dataset is determined.

[0046] The calculation of vitality score involves multiple parameters. This application uses four parameters to calculate the vitality score of the dataset: the confidence score of the first dataset, the similarity between the first dataset and the active dataset, the number of searches of the first dataset, and the sentiment score of the first dataset. The calculation method is as shown in Formula 2 above.

[0047] The confidence score of the first dataset is calculated by averaging the confidence scores of all data units within it. The similarity score between the first dataset and the active dataset is determined by comparing the similarity of their multi-dimensional vectors. A high similarity score indicates that the content of the first dataset is also active and easily used to generate response data. The retrieval count of the first dataset represents the number of times it has been used as the target dataset to generate response data. The sentiment value of the first dataset represents its sentiment level; different sentiments correspond to different sentiment values.

[0048] In this application, the vitality parameter of the first data set is calculated using the above four parameters, which can objectively reflect whether the data of the data unit in the first data set is important and frequently used. Therefore, based on the level of the vitality parameter, unimportant data sets can be deleted from the first data set, thereby improving the accuracy of the obtained target data set.

[0049] As an optional example, the above method further includes: after conducting a preset number of dialogues through a dialog window, for all data sets in the database, identifying data set pairs where the two data sets in the data set pair do not match each other; calculating the confidence scores of the two data sets in the data set pair; and lowering the vitality score or confidence level of the data set with the smaller confidence score when the difference between the confidence scores of the two data sets is greater than a preset difference.

[0050] In this application, for the datasets in the database, due to the significant differences in the multidimensional vectors of data units in different datasets, there may be situations where the data in different datasets are contradictory or mismatched. For example, during the process of inputting dialogue data in the past, a user may have said "I hate rain," but also inputted poems describing the beauty of rainy weather. After these two dialogue data are vectorized and stored as data units in the database, they are divided into different datasets. The data in the two datasets constitute two contradictory data points: the user may like rain, or they may dislike rain. This part of the data needs to be corrected again.

[0051] This application allows for the correction of data sets in the database after multiple rounds of dialogue with the user. For example, a correction operation can be performed every 10 rounds of dialogue. During correction, the database can be compared to other datasets to identify contradictory data sets, or to compare changed datasets to identify contradictory data sets. For each data set pair, two sets are found to be contradictory and require correction. During correction, a credibility score is calculated for each data set. A higher credibility score indicates a higher level of credibility for that dataset. If one dataset has a high credibility score and the other a low score, the vitality score or confidence level of the dataset with the lower score is lowered. The dataset with the higher credibility score is retained. If the credibility scores of the two datasets are similar, a query can be generated for the user based on the data from both datasets, such as asking if the user likes rain. Based on the user's input (likes or dislikes rain), the vitality score or confidence level of the dataset with the incorrect data is adjusted.

[0052] In this application, the vitality score or confidence level of the database set is adjusted every few rounds, which can ensure the accuracy of the database set and improve the accuracy of the generated response data.

[0053] As an optional example, for all datasets in the database, determining dataset pairs involves comparing the inconsistency scores of any two datasets using the following formula. If the inconsistency score is greater than a preset score, the two datasets are identified as a dataset pair: (3) in, for and The contradiction score, Let i be the data set. For data set j, For model predictions and The contradiction score, for The average value of the multidimensional vector of the data unit. for The average value of the multidimensional vector of the data unit.

[0054] This application provides a method for determining whether data sets in a database constitute a data set pair. In this method, a contradiction score is calculated for each pair of data sets using Formula 3 above. If the contradiction score is large, exceeding a preset score, the two data sets are considered to constitute a data set pair. When calculating the contradiction score, a Natural Language Inference Model (NLI) is used to calculate the contradiction value of the data in the two data sets. Furthermore, the cosine value of the average of the multidimensional vectors of the data units in the two data sets is calculated, and then 1 is subtracted from the cosine value to obtain the degree of difference between the two data sets. The larger of the contradiction value and the degree of difference is used as the contradiction score to determine the data set pair.

[0055] For every two data sets in the database, after calculating the contradiction score using Formula 3 above, if the contradiction score is greater than the preset score, a pair of data sets is obtained; if the contradiction score is less than or equal to the preset score, the two data sets are not contradictory.

[0056] This application uses Formula 3 to determine data set pairs, thereby filtering out contradictory data set pairs in the database for further correction and improving the accuracy of the data in the database.

[0057] As an optional example, calculating the confidence score of two data sets in a dataset pair involves calculating the confidence score of the data sets using the following formula: (4) in, for The credibility score, For data set j, This represents the average confidence level of the data cells in the dataset. This represents the time difference between the most recent update of the dataset and the current time. For a fixed time window, and As weight.

[0058] In this application, for each data set in a pair, the confidence score of each data set needs to be calculated first. Then, based on the confidence score, a decision is made as to whether to correct the two data sets. The confidence score can be calculated using Formula 4 mentioned above. The average confidence score of the data units within the data set is then multiplied by the weights. Multiply them to get the first result, and then... To determine the age of data in a dataset, subtract that data from 1, and then use the result and its weight. Multiplying the results yields a second result. Summing the first and second results gives the confidence score. Therefore, the lower the confidence level and the older the data, the lower the confidence score. After calculating the confidence scores for two datasets using Formula 4, if one dataset has a high confidence score and the other a low one, the vitality score or confidence level of the dataset with the lower confidence score is lowered. If the confidence scores of the two datasets are similar, query data can be generated for the user based on the data from both datasets, and the vitality score or confidence level of the dataset with erroneous data can be adjusted based on the user's response.

[0059] This application calculates the confidence score of a dataset pair using the formula 4 above. Thus, the lower the confidence level of the dataset and the older the data, the lower the confidence score, achieving the effect of accurately calculating the confidence score.

[0060] As an optional example, such as Figure 4 As shown, based on the dialogue data, the first data set that matches the dialogue data retrieved from the database includes: S402, Generate prediction data based on dialogue data; S404, Based on the dialogue data, query the database for a matching data set, and based on the prediction data, query the database for a matching data set; S406: After deduplicating the retrieved data set, use it as the first data set.

[0061] In this application, when obtaining user dialogue data and querying the database for a first dataset, the top few datasets matching the dialogue data can be combined to form the first dataset. For example, with 100 datasets, the matching degree between the dialogue data and each dataset is determined, and the top 5 datasets are selected as the first dataset based on their matching degree. Alternatively, in determining the datasets matching the dialogue data, a vitality score can be introduced as a coefficient to determine the matching degree between the dataset content and the dialogue data. This matching degree is then multiplied by the vitality score coefficient to obtain a matching value. Based on the matching value, the top few datasets are selected as the first dataset.

[0062] In this application, the first data set can be determined not only based on dialogue data, but also based on dialogue data and its predicted data. Specifically, a subset of matching data sets is determined based on the dialogue data, and another subset of matching data sets is determined based on the predicted data. After removing duplicates from the two sets, the first data set is obtained.

[0063] This application determines a first data set by introducing predicted data, thereby allowing the introduction of a data set related to the predicted data into the first data set, which helps to restore the generation of the data.

[0064] As an optional example, determining the target data set from the first data set according to the vitality score includes: calculating the value score of the data set in the first data set that matches the predicted data, wherein the value score is related to the confidence level and vitality score of the data set; deleting the data set with a value score lower than a preset value from the first data set; and determining the remaining data set as the target data set.

[0065] In this application, after determining the first dataset, the datasets with value scores lower than a preset value can be deleted, and the remaining datasets can be used as the target dataset to determine the response data. The value score is calculated based on the dataset's confidence level and vitality score; higher confidence and vitality scores result in a higher value score.

[0066] The following example illustrates a user's question-and-answer session via a dialog window. In a user question-and-answer scenario, the dialogue data is user input, which may be in the form of text, images, or voice. In this application, the heterogeneous user input data (text, images, audio) is transformed into data units in a unified format, providing a foundation for subsequent aggregation, metabolism, and retrieval. The system first receives user input, such as the text "I hate rainy days, they make me feel down," a photo of a rainy scene, or the voice "The sound of rain relaxes me," and stores it as raw data. The format is { Next, a multimodal large model is used to map the data to a unified semantic space: the text is used to generate a 768-dimensional vector through the sentence model (Sentence-BERT, SBERT). The image was used to generate a 512-dimensional vector through a multimodal model (Contrastive Language–Image Pre-training, CLIP). (Linear transformation to 768 dimensions); Audio was transcribed into text using the Whisper speech recognition model and then generated using SBERT. .

[0067] Regardless of whether the input is text, speech, or image, the output is a unified semantic vector. and data information (metadata), which may include: : Modalities of dialogue data (text, images, audio). : A unique identifier for dialogue data, indicating in which dialogue the data was generated. : Timestamp, which records the specific time when the dialogue data was created. Initial credibility: Atomic type: The category of dialogue data parsed by the model (facts, opinions, sentiments, intentions, etc.). Initial credibility includes different values, such as user input = 1.0, which is the highest credibility. The content comes directly from the user's own words, uploaded photos, voice, etc., representing the user's most authentic subjective expression or factual statement. The system defaults to considering this the "most reliable source," without any distortion or error introduction from intermediate links, thus assigning a full score of 1.0. Model output or inferred data = 0.7, medium to high credibility. Obtained by a large model (LLM such as GPT-4o) based on user input through semantic parsing, reasoning, summarizing, sentiment classification, and intention inference, therefore the initial credibility is lower than user input. Web retrieval = 0.5. Medium to low credibility. The content comes from web searches, crawled web pages, knowledge bases, etc. Web information generally has issues with timeliness, inconsistent source quality, misinformation, advertorials, and conflicting viewpoints, so the system naturally takes a more cautious approach, thus assigning an initial credibility of 0.5.

[0068] In this application, the user-input dialogue data is generated into a 768-dimensional vector, which can then be compared with the dataset in the database. Furthermore, the user-input dialogue data also needs to generate the aforementioned unified semantic vector. The format of the data information is stored in the database as a data unit.

[0069] For the generation of the dataset in the database, the user-input dialogue data, model inference and judgment data, and web crawled data are all generated as a unified semantic vector. The data information is formatted and stored as data units. All data units are aggregated using Formula 1 above to obtain multiple data sets from multiple databases.

[0070] Each dataset in the database can be assigned a Viability Score using Formula 2 above. The Viability Score (VS) is like a comprehensive health score for each dataset, determining how long it survives in the system and how likely it is to be prioritized in memory. It is a weighted sum of four components: Credibility (30% weight): This is determined by the amount of user-generated content in the dataset (maximum 1.0), less by model guesses (0.7), and more by online sources (0.5), with a higher average indicating greater reliability; Coherence (20% weight): This assesses consistency between the dataset and other frequently used datasets, with a higher score indicating a more consistent "storyline"; Utility (30% weight): This is the logarithm of the number of times the dataset has been retrieved / used, with more frequent calls indicating greater "usefulness," and the score gradually but steadily increasing; and Emotional Salience (20% weight): This measures the intensity of emotions (e.g., strong dislike, happiness, anxiety), measured using an emotional model and normalized to 0-1, with stronger emotions resulting in a higher score.

[0071] There may be contradictions between data sets in the database. Therefore, the contradiction score between two data sets can be calculated using Formula 3 above, thereby identifying the contradictory data set pairs.

[0072] In reality, this calculates the semantic "degree of contradiction" or "probability of logical conflict" between two data sets. Specifically, it outputs the contradiction score of the premise-hypothesis pair. A higher contradiction score indicates that the model believes "if the content of one data set is taken as a premise and the other as a hypothesis, then the hypothesis is almost impossible to be true," meaning they are logically contradictory. Representative texts from two datasets (usually the central sentence, summary, or natural language description generated by fusing all data units, such as "Dataset 1: Users hate rainy days because they make them feel down" vs. "Dataset 2: Users find the sound of rain relaxing and healing") are input into the NLI (RoBERTa) model as a pair of premises and hypotheses. RoBERTa outputs three logits scores, corresponding to three categories: contradiction (if the premise is true, the hypothesis must be false); entailment (if the premise is true, the hypothesis must be true); and neutral (the premise cannot determine the truth value of the hypothesis). The model uses softmax to convert the logits into probabilities. Specifically, it takes the probability value of the "contradiction" category (usually between 0 and 1, the closer to 1, the more contradictory).

[0073] Formula 3 above considers both hard logical conflicts (high contradiction scores from NLI) and soft semantic inconsistencies (velocities are very different → 1-cos is larger), taking the more severe one as the final contradiction score. When the contradiction score > 0.8, the two datasets are marked as contradictory dataset pairs.

[0074] For each pair of data sets, a credibility score is calculated using Formula 4 above. The core logic is "who is more credible, who is more recent → who has more say," with the aim of prioritizing the more reliable and recent data set without arbitrarily deleting any data sets, while minimizing disruption to users.

[0075] When the system detects a contradictory data set pair (e.g., data set 1 "hates rain" vs. data set 2 "likes the sound of rain") with a contradiction score > 0.8, it will not directly delete either pair. Instead, it will first calculate the confidence score for each data set and compare the score difference between the two data sets in the pair. If the difference is large This indicates that one dataset clearly dominates (for example, a user recently confirmed that they "actually like rain"), so the system automatically trusts the high-scoring dataset and ignores the low-scoring dataset. Overall downward adjustment (or VS lowered), if VS drops to < If the score is 0.3, it will be moved to the archive (inactive, but not deleted, and can be restored in the future), and the high-scoring data set will be retained or enhanced.

[0076] If the difference is very small (<0.1), the two datasets are "equal" (e.g., both are old memories, or have similar reliability), then the system will ask the user a gentle question like, "You previously said you hated rain, but recently you mentioned liking the sound of rain. Has your preference changed?"

[0077] Asking a user a question is itself considered a new interaction within the window (the conversation continues, counted in the K=10 round dialogue counter, and will also generate new user input, potentially updating the data set and triggering new aggregations / metabolic processes). However, this question is a system-initiated "clarification intervention," not a user-initiated question, so it is not considered "user-triggered introspection," but rather a natural extension of the conflict resolution process. After the user replies (e.g., "I don't actually dislike it anymore"), the system will update the corresponding data set accordingly. (User confirmation → Approaching 1.0), recalculate VS, and decide which dataset to downgrade to file, thus completing the final resolution of the dataset pairs.

[0078] For user dialogue data, a first data set can be retrieved from the database's dataset. In this application, proactive intervention can be introduced to predict data based on the user's dialogue data, and then a dataset can be searched based on the dialogue data and the predicted data.

[0079] Regular retrieval of dialogue data: This is based on a vector of the user's current query Q (e.g., "What will the weather be like tomorrow?"). Direct calculation with all data sets similarity Returns a Top-K (K=5) dataset with a similarity greater than θ_sim=0.7. These datasets provide "immediately relevant" memories, such as C1 ("Users hate rainy days and feel down") or external weather data cells (if available for web retrieval). Their core role is to support the core facts or direct answers in the response, such as "It might rain tomorrow, so bring an umbrella."

[0080] Predictive data for forward-looking retrieval: Future topic vectors predicted using Long Short-Term Memory (LSTM) networks. (M = 3 prediction data points, such as T1 = "Destination Recommendation", T2 = "Emotional Management", T3 = "Activity Suggestions"), for each Searching for Top-K related datasets Then calculate the predictive intervention value for each predictive data set. (Prediction confidence level, e.g., 0.9) × (Dataset Viability Rating) × Only keep The data set.

[0081] Before fusion, the system filters out duplicate or low-relevance items from both the dialogue data and prediction data sets. (For example, if a prediction data point appears in both regular and predictive results, it is retained only once, but its dual role is labeled.) Simultaneously, the system extracts the key element of each prediction data point: a content vector. Metadata (including) Credibility, Emotional Salience, and atomic types (sentiment, intent, etc.). This ensures that the data input to the LLM is structured, such as a JSON-like list: [ , ].

[0082] This fusion transforms the agent from a "passive responder" into a "proactive caregiver," with routine retrieval providing accuracy and predictive retrieval providing foresight. In the aforementioned process, during predictive retrieval, a vitality score is used as a constraint, thereby removing data sets from the first dataset with lower predictive intervention value. The resulting target dataset is used to generate response data.

[0083] Figure 5 This is a schematic diagram of a dialog window response data generation device provided in an embodiment of this application. Figure 5 As shown, the device for generating response data for the aforementioned dialog window includes: Module 502 is used to acquire user dialogue data; The query module 504 is used to query a first data set that matches the dialogue data from the database, wherein the database includes multiple data sets; The scoring module 506 is used to obtain the vitality score of each first data set. The vitality score is calculated based on the confidence of the first data set, the similarity between the first data set and the active data set, the number of searches of the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set. The determination module 508 is used to determine the target data set from the first data set according to the vitality score; The generation module 510 is used to generate response data for the dialogue data based on the target dataset.

[0084] This application can be applied to the process of users interacting with intelligent agents through a dialogue window. The dialogue window is the interface for interaction between the user and the intelligent agent, which can be displayed on the client. Users can input dialogue data through the dialogue window and view response data through the dialogue window.

[0085] The type of intelligent agent in this application is not limited. The intelligent agent corresponds to a database, which stores a set of data. The dialogue data input by the user will search for the first set of data in the database. After filtering, the target set of data is obtained, and the response data is generated based on the target set of data and fed back to the user.

[0086] There are multiple ways to obtain dialogue data in this application. For example, users can input data by inputting voice, inputting text, recognizing text through screenshots, inputting images, taking screenshots, taking photos, etc. The dialogue window can use the text, voice, images, screenshots, or photos input by the user as dialogue data.

[0087] The database in this application can store data scraped from the web, as well as question-and-answer data, push data, and other data generated during historical conversations between the current user and other users. For private intelligent agents, data is not shared; only data scraped from the web and question-and-answer data generated during personal conversations are stored, and this database only provides services to the current user.

[0088] The database in this application may include multiple data sets, each used to store different types of data. One example is classifying the data and then dividing the classified data into different data sets. Another example is grouping data with high similarity into the same data set.

[0089] The data set in this application may include multiple data sets, which are grouped into one data set because they are of the same type or have high similarity.

[0090] In this application, for user dialogue data, a first data set matching it can be retrieved from the database. The first data set is the set of data in the database that matches the dialogue data.

[0091] In this application, each data set in the database can correspond to a vitality score, which represents the importance of the data set. A higher importance score indicates that the data in that data set is more important and easier to use to generate response data; conversely, a lower importance score indicates that the data in that data set is less important and less likely to be used to generate response data. Data sets with an importance score below a certain level can be stored in a less frequently used database or storage space.

[0092] The vitality score of the first dataset can be calculated either when data units within the dataset change or when the dataset is determined. The calculation of the vitality score involves several aspects, including the confidence score of the first dataset, the similarity score between the first dataset and active datasets, the number of searches performed on the first dataset, and the sentiment score of the first dataset. The confidence score of the first dataset is calculated based on the confidence scores of all data units within it. For example, the confidence score of each data unit in the first dataset is calculated, and then the average of these confidence scores is taken to obtain the overall confidence score of the first dataset. The similarity score between the first dataset and active datasets requires first determining the active dataset. An active dataset is a dataset that has been used multiple times as a target dataset to generate response data; it can be determined based on the number of times it has been used as a target dataset in the past. The similarity score between the first dataset and active datasets is calculated by assessing the similarity of the data unit content between the two datasets, and the degree of similarity. A high similarity score indicates that the content of the current first dataset is also important and easily usable. The number of retrievals for the first data set is the number of times the first data set is used as the target data set to generate response data. Finally, the sentiment value of the first data set represents the sentiment of the first data set. The sentiment can be obtained by the intelligent model to identify the data units in the data set, determine the sentiment of all data units in the first data set, and then select the sentiment that is most abundant and determine the sentiment value of that sentiment as the sentiment value of the data set. Different sentiments correspond to different sentiment values.

[0093] After completing the steps described above to calculate the vitality score of the first dataset, a vitality score for the first dataset is obtained. After removing datasets with low vitality scores, the remaining datasets are used as the target dataset to generate the response data.

[0094] The solution provided in this application involves: acquiring user dialogue data; querying a database for a first data set matching the dialogue data, wherein the database includes multiple data sets; acquiring a vitality score for each first data set, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set; determining a target data set from the first data set according to the vitality score; and generating response data for the dialogue data based on the target data set. Thus, after finding a matching first data set based on the dialogue data, the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and an active data set, the number of searches for the first data set, and the sentiment value of the first data set to obtain the importance of the first data set. Furthermore, the target data set is selected based on the vitality score to generate response data, thereby improving the accuracy of the generated response data.

[0095] For other examples of this embodiment, please refer to the examples above, which will not be repeated here.

[0096] like Figure 6 As shown in the figure, this application provides an electronic device, including a processor 111, a communication interface 112, a memory 113, and a communication bus 114, wherein the processor 111, the communication interface 112, and the memory 113 communicate with each other through the communication bus 114. Memory 113 is used to store computer programs; In one embodiment of this application, the processor 111, when executing the program stored in the memory 113, implements the dialog window response data generation method provided in any of the foregoing method embodiments.

[0097] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the dialog window response data generation method provided in any of the foregoing method embodiments.

[0098] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0099] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented using software plus a general-purpose hardware platform, or of course, using hardware. Based on this understanding, the above technical solutions, in essence or the parts that contribute to the related technology, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0100] It should be understood that the terminology used herein is for the purpose of describing particular exemplary embodiments only and is not intended to be limiting. Unless the context clearly indicates otherwise, the singular forms “a,” “an,” and “described” as used herein may also include the plural forms. The terms “comprising,” “including,” “containing,” and “having” are inclusive and therefore indicate the presence of the stated features, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, elements, components, and / or combinations thereof. The method steps, processes, and operations described herein are not construed as requiring them to be performed in a particular order described or illustrated unless the order of performance is explicitly indicated. It should also be understood that additional or alternative steps may be used.

[0101] The above description is merely a specific embodiment of the present invention, enabling those skilled in the art to understand or implement the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. A reply data generation method for a conversation window, characterized by, include: Obtain user conversation data; Based on the dialogue data, a first data set matching the dialogue data is queried from the database, wherein the database includes multiple data sets; Obtain a vitality score for each of the first data sets, wherein the vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and the active data set, the number of retrievals of the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set; Based on the vitality score, a target data set is determined from the first data set; Based on the target data set, generate response data for the dialogue data.

2. The method of claim 1, wherein, The dialogue data includes at least one of text data, voice data, and image data. After acquiring the user's dialogue data, the method further includes: The text data, speech data, or image data are converted into multidimensional vectors using a multimodal large model; wherein the multidimensional vectors of the text data, speech data, and image data have the same dimension. The data information of the dialogue data and the multidimensional vector are stored as a data unit in a data set of the database. The database also includes data units of historical dialogue data, data units of historical reply data, and data units of network search content.

3. The method of claim 2, wherein, Storing the dialogue data and the multidimensional vector as a single data unit in a data set within the database includes: The matching degree between a data unit and an existing data set is calculated using the following formula. When the matching degree is greater than a preset matching degree, the data unit is stored in the matching data set. If the data unit is not assigned to any data set, a new data set is created for the data unit: (1) in, The matching degree, For data unit i, For data set j, Let i be a multidimensional vector of data unit i. is the average of the multidimensional vectors of all data units in data set j.

4. The method according to claim 1, characterized in that, Obtaining the vitality score for each of the first datasets includes: The vitality score is calculated using the following formula: (2) in, Rate the vitality mentioned above. The confidence level of the first data set. The similarity between the first data set and the active data set is given. The number of searches performed on the first data set. The sentiment value of the first data set, , , , As weight.

5. The method according to claim 1, characterized in that, The method further includes: After a preset number of dialogue rounds through the dialog window, for all data sets in the database, a pair of data sets is determined, wherein the two data sets in the pair do not match each other. Calculate the confidence scores of the two data sets in the data set pair; When the difference in confidence scores between two datasets is greater than a preset difference, the vitality score or confidence level of the dataset with the smaller confidence score will be lowered.

6. The method according to claim 5, characterized in that, For all data sets in the database, determining the data set pairs includes: The inconsistency scores of any two data sets are compared using the following formula. If the inconsistency score is greater than a preset score, the two data sets are identified as a pair: (3) in, for and The contradiction score, Let i be the data set. For data set j, For model predictions and The contradiction score, for The average value of the multidimensional vector of the data unit. for The average value of the multidimensional vector of the data unit.

7. The method according to claim 5, characterized in that, Calculating the confidence score of the two data sets in the data set pair includes: The confidence score of the dataset is calculated using the following formula: (4) in, for The credibility score, For data set j, This represents the average confidence level of the data cells in the dataset. This represents the time difference between the most recent update of the dataset and the current time. For a fixed time window, and As weight.

8. The method according to claim 1, characterized in that, Based on the dialogue data, querying the database for a first data set that matches the dialogue data includes: Based on the dialogue data, predictive data is generated; Based on the dialogue data, a matching data set is queried from the database, and based on the prediction data, a matching data set is queried from the database. The queried data set is deduplicated and used as the first data set.

9. The method according to claim 8, characterized in that, Based on the vitality score, the target dataset determined from the first dataset includes: Calculate the value score of the data set in the first data set that matches the predicted data, wherein the value score is related to the confidence score and vitality score of the data set; Delete the data set whose value score is lower than the preset value from the first data set; The remaining data set is determined as the target data set.

10. A device for generating response data for a dialog window, characterized in that, include: The acquisition module is used to acquire user conversation data. The query module is used to query a first data set that matches the dialogue data from a database, wherein the database includes multiple data sets; The scoring module is used to obtain a vitality score for each of the first data sets. The vitality score is calculated based on the confidence level of the first data set, the similarity between the first data set and the active data set, the number of searches of the first data set, and the sentiment value of the first data set, and is used to represent the importance of the first data set. The determination module is used to determine a target data set from the first data set according to the vitality score; The generation module is used to generate response data for the dialogue data based on the target data set.