Method and system for processing conversational data

By performing clustering calculations on the semantic vectors of dialogue data, this approach solves the problem of relying on manual annotation in existing technologies for dialogue data processing. It enables the extraction of key information under unsupervised learning, improving the efficiency and transferability of dialogue data processing.

CN115221296BActive Publication Date: 2026-06-23ALIBABA (CHINA) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIBABA (CHINA) CO LTD
Filing Date
2022-06-07
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

The task of processing dialogue data in the existing technology relies on manual annotation, which consumes a lot of manpower and time, resulting in a technical challenge that the technology cannot effectively solve. The technical problem in the existing technology is the lack of a general method that can be transferred to downstream applications, especially in the process of annotation and processing dialogue data.

Method used

By employing techniques to extract dialogue data, this process involves acquiring dialogue data, extracting semantic vectors from multiple rounds of dialogue within the dialogue data, and then using these semantic vectors for clustering calculations to determine the key information for each category.

Benefits of technology

It enables the automatic extraction of key information, especially topics and key phrases, from dialogue data under unsupervised learning conditions, reducing reliance on manual annotation, improving the efficiency and transferability of dialogue data processing, and supporting the application of large-scale dialogue data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115221296B_ABST
    Figure CN115221296B_ABST
Patent Text Reader

Abstract

Embodiments of the present specification provide a method and system for processing dialogue data, wherein the method comprises: obtaining dialogue data, wherein the dialogue data comprises a plurality of rounds of dialogue; extracting a semantic vector of each of the plurality of rounds of dialogue; performing clustering calculation using the semantic vector of each of the plurality of rounds of dialogue to obtain a clustering result, and determining key information corresponding to each category according to the clustering result, so that the task of processing dialogue data can be practically applied to large-scale dialogue data, and the task of processing dialogue data can be applied to real scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of computer technology, and in particular to a method for processing dialogue data. Background Technology

[0002] With the development of science and technology, communicating with others online and through intelligent dialogue devices has become a common way of communication. Consequently, the amount of online dialogue data and human-computer dialogue data is increasing daily. Based on this massive amount of dialogue data, data mining of dialogue data is of great significance for improving communication methods. By processing dialogue data, downstream tasks such as dialogue topic distribution statistics, dialogue keyword extraction, dialogue structure learning, and dialogue summarization can be assisted.

[0003] Currently, the industry uses supervised learning to process dialogue data. However, manually labeling massive amounts of dialogue data is labor-intensive and time-consuming, resulting in a lack of general methods that can be transferred to downstream applications. Summary of the Invention

[0004] In view of this, embodiments of this specification provide a method for processing dialogue data. One or more embodiments of this specification also relate to a system for processing dialogue data, a computing device, a computer-readable storage medium, and a computer program, to address the technical deficiencies existing in the prior art.

[0005] According to a first aspect of the embodiments of this specification, a method for processing dialogue data is provided, comprising: acquiring dialogue data, wherein the dialogue data includes multiple rounds of dialogue; extracting semantic vectors of each of the multiple rounds of dialogue; performing clustering calculation using the semantic vectors of the multiple rounds of dialogue to obtain clustering results; and determining key information corresponding to each category based on the clustering results.

[0006] Optionally, the key information is a topic, and determining the key information corresponding to each category based on the clustering results includes: using the clustering label of each category as the topic corresponding to each round of dialogue in that category based on the clustering results.

[0007] Optionally, the key information is key dialogue, and determining the key information corresponding to each category based on the clustering results includes: calculating the vector distance between the turn dialogue in each category and the center point of that category based on the clustering results; and selecting the turn dialogue closest to the center point from each category as the key dialogue based on the vector distance.

[0008] Optionally, it also includes: pushing the key phrases to the dialogue construction module, so that the dialogue construction module uses the key phrases as the phrases of the dialogue nodes in the dialogue flow model to be constructed.

[0009] Optionally, extracting the semantic vector of each round of dialogue in the dialogue data includes: for each dialogue, concatenating multiple rounds of dialogue in that dialogue to obtain a dialogue sequence; adding a sequence identifier at the beginning of each dialogue sequence to distinguish the sequence, and adding a corresponding round identifier to each round of dialogue to distinguish the round of dialogue; inputting the dialogue sequence into a semantic vector extraction model to extract the semantic vectors of each of the multiple rounds of dialogue.

[0010] Optionally, the step of performing clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain clustering results includes: storing the semantic vectors of the multiple rounds of dialogue in a feature pool; using the feature pool as an initial segmentation range; randomly selecting a round of dialogue within the segmentation range as a segmentation point; calculating the difference between the forward and backward aggregated features of the segmentation point, and selecting the position with the largest difference as the optimized segmentation point of the segmentation range; using the optimized segmentation point to divide the rounds of dialogue within the segmentation range into two parts; and using the two parts of segmented rounds of dialogue as updated segmentation ranges. For a segmentation range where the number of rounds of dialogue does not meet the preset dialogue number requirement, the segmentation range is changed to a completed segmentation. If the number of optimized segmentation points does not meet the preset segmentation point requirement, for the updated segmentation range, the process returns to the step of randomly selecting a round of dialogue as a segmentation point within the segmentation range. If the number of optimized segmentation points meets the preset requirement, all optimized segmentation points are obtained as initialization points to initialize KMeans cluster centers, and the optimized segmentation points are adjusted using the KMeans clustering algorithm. The clustering result is then determined based on the adjusted optimized segmentation points.

[0011] Optionally, the method further includes: determining whether the topics of each of the plurality of rounds of dialogue meet preset requirements; deleting topics that do not meet the preset requirements from the topics of the plurality of rounds of dialogue; calculating the distance between the semantic vector of the round of dialogue and the semantic vector of other rounds of dialogue for the round of dialogue with the deleted topic; and selecting the topic of other rounds of dialogue as the topic of the round of dialogue based on the calculated distance.

[0012] Optionally, determining whether the topics of each of the multiple rounds of dialogue meet the preset requirements includes: counting the topics that appear repeatedly in the multiple rounds of dialogue in each dialogue to obtain the number of times each topic appears repeatedly in the dialogue; and determining topics whose number of repetitions does not reach the preset number of repetitions as topics that do not meet the preset requirements.

[0013] Optionally, determining whether the topics of each of the multiple rounds of dialogue meet the preset requirements includes: counting the topics that appear consecutively in the multiple rounds of dialogue in each dialogue to obtain the number of consecutive repetitions of each topic in the dialogue; and determining topics whose consecutive repetition count does not reach the preset range of consecutive repetition counts as topics that do not meet the preset requirements.

[0014] Optionally, the semantic vector extraction model is a multi-layer Transformer model; wherein, the last layer Transformer is used to predict the semantic vectors of each of the multiple rounds of dialogue according to an autoregressive method, wherein the autoregressive method refers to predicting the semantic vectors of the subsequent rounds of dialogue from the semantic vectors of the earlier rounds of dialogue in a dialogue.

[0015] Optionally, the method further includes: before training the semantic vector extraction model, for each dialogue sample set, concatenating multiple rounds of dialogue samples and additional rounds of dialogue samples in the dialogue sample set of that dialogue to obtain a dialogue sequence sample for that dialogue; inputting the dialogue sequence sample into the semantic vector extraction model for training to obtain a trained semantic vector extraction model; wherein, during training, the last layer of the semantic vector extraction model predicts the semantic vectors of each round of dialogue using the autoregressive method, and the semantic vector extraction model adjusts its parameters based on the prediction results of the additional rounds of dialogue.

[0016] Optionally, the step of using the semantic vectors of the multiple rounds of dialogue to perform clustering calculations and obtain clustering results includes: using the semantic vectors of the multiple rounds of dialogue to perform clustering calculations based on a data density clustering algorithm and obtain clustering results.

[0017] Optionally, the step of using the semantic vectors of the multiple rounds of dialogue to perform clustering calculation using a data density-based clustering algorithm includes: setting the clustering parameters of the data density-based clustering algorithm based on a first clustering precision, and performing clustering calculation using the semantic vectors of the multiple rounds of dialogue to obtain a first clustering result; removing noise points from the multiple rounds of dialogue based on the first clustering result to obtain updated multiple rounds of dialogue; setting the clustering parameters of the data density-based clustering algorithm based on a second clustering precision, and performing clustering calculation using the updated semantic vectors of the multiple rounds of dialogue to obtain a second clustering result, wherein the second clustering precision is greater than the first clustering precision.

[0018] According to a second aspect of the embodiments of this specification, a system for processing dialogue data is provided, comprising: a client configured to send dialogue data to a server, wherein the dialogue data includes multiple rounds of dialogue, and receiving key phrases fed back by the server in response to the dialogue data, and using the key phrases as phrases for dialogue nodes in a dialogue flow model to be constructed. The server is configured to acquire the dialogue data, extract semantic vectors of each of the multiple rounds of dialogue, perform clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain clustering results, calculate the vector distance between the rounds of dialogue in each category and the centroid of that category based on the clustering results, select the round of dialogue closest to the centroid from each category as the key phrase based on the vector distance, and send the key phrases to the client.

[0019] According to a third aspect of the embodiments of this specification, a computing device is provided, comprising: a memory and a processor; the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement the steps of the method for processing dialogue data as described in any embodiment of this specification.

[0020] According to a fourth aspect of the embodiments of this specification, a computer-readable storage medium is provided that stores computer-executable instructions, which, when executed by a processor, implement the steps of the method for processing dialogue data as described in any embodiment of this specification.

[0021] This specification provides an embodiment of a method for processing dialogue data. This method acquires dialogue data, which includes multiple rounds of dialogue. It extracts semantic vectors from each of these rounds, performs clustering calculations using these semantic vectors, obtains clustering results, and determines key information corresponding to each category based on the clustering results. Therefore, this method extracts semantic vectors at the dialogue round level and applies unsupervised clustering to extract various key information from the dialogue. It fully leverages the advantages of unsupervised, especially self-supervised, learning, alleviating the demands of supervised learning on data volume and annotation. This allows the task of processing dialogue data to be practically applied to large-scale dialogue data, enabling tasks such as dialogue topic extraction and key word mining to be implemented in real-world scenarios. Attached Figure Description

[0022] Figure 1 This is a schematic diagram illustrating an application scenario of a method for processing dialogue data provided in one embodiment of this specification;

[0023] Figure 2 This is a schematic diagram illustrating an application scenario of a method for processing dialogue data provided in another embodiment of this specification;

[0024] Figure 3 This is a schematic diagram illustrating an application scenario of a method for processing dialogue data provided in another embodiment of this specification;

[0025] Figure 4 This is a flowchart illustrating a method for processing dialogue data according to one embodiment of this specification;

[0026] Figure 5 This is a schematic diagram of the structure of a pre-trained dialogue model provided in one embodiment of this specification;

[0027] Figure 6 This is a flowchart illustrating the processing procedure of a method for processing dialogue data according to one embodiment of this specification.

[0028] Figure 7 This is a flowchart illustrating the processing procedure of a method for processing dialogue data provided in another embodiment of this specification;

[0029] Figure 8 This is a schematic diagram of the structure of a device for processing dialogue data according to one embodiment of this specification;

[0030] Figure 9 This is a structural block diagram of a computing device provided in one embodiment of this specification. Detailed Implementation

[0031] Many specific details are set forth in the following description to provide a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this specification. Therefore, this specification is not limited to the specific implementations disclosed below.

[0032] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this specification. The singular forms “a,” “described,” and “the” as used in one or more embodiments of this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.

[0033] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this specification, and similarly, second may also be referred to as first. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to a determination."

[0034] First, the terms and concepts used in one or more embodiments of this specification will be explained.

[0035] Self-supervised learning falls under the category of unsupervised learning. It's a learning method that trains and optimizes based on inherent properties of the samples themselves, even when the samples lack corresponding labels. In dialogue data scenarios, self-supervised learning can be considered a learning method that trains and optimizes based on the inherent properties of the dialogue text itself, such as predicting the next nearest neighbor text for a given dialogue segment.

[0036] Pretrained language model: refers to a language model trained on large-scale text, such as BERT.

[0037] Conversational pretrained language model: This refers to a language model trained on large-scale two-person or multi-person dialogues, such as the BERT language model. It learns information based on the dialogue pretrained language model, including dialogue-specific information such as turn order and roles.

[0038] Topic-aware dialogue segmentation refers to the process of dividing a dialogue into multiple sub-segments based on changes in the topic during each dialogue turn. Each sub-segment contains a unique topic.

[0039] Dialogue Flow Model: A state machine-based dialogue model where nodes include user utterances and system responses, and nodes are connected by edges.

[0040] Dialogue Flow Node Mining: Automatically summarize key user statements and system responses from dialogue logs.

[0041] Clustering: Given a set of data points (which are usually represented by vectors), clustering algorithms are used to divide the data points into multiple categories, assigning each data point to a corresponding category.

[0042] To make the methods provided in the embodiments of this specification easier to understand, firstly, in conjunction with Figures 1-3 The illustrated application scenario diagram provides a schematic description of the application scenarios involved in the methods provided in the embodiments of this specification.

[0043] Figure 1 The diagram illustrates an application scenario of a method for processing dialogue data according to an embodiment of this specification. Figure 1 As shown in the embodiments of this specification, the application scenarios provided include a server 110 and a downstream task terminal 120. The server 110 can be used to execute the method for processing dialogue data provided in the embodiments of this specification to obtain key information for each category in the dialogue data. The downstream task terminal 120 can be used to execute downstream tasks related to the key information. For example, the downstream task terminal 120 may include one or more clients or other servers. For example, downstream tasks may include: tasks for building a dialogue flow model, tasks for statistical analysis of dialogue topic distribution, tasks for extracting dialogue keywords, tasks for learning dialogue structure, tasks for dialogue summarization, etc. A communication connection can be established between the server 110 and the downstream task terminal 120 for communication. It is understood that, depending on the needs of the scenario, downstream tasks can also be set in the server 110 and executed by the server 110; this embodiment of the specification does not limit this.

[0044] Below, we will take downstream tasks as an example of tasks used to build dialogue flow models, combined with... Figure 2 The schematic diagram illustrating an embodiment of this specification illustrates the application scenario. It is understood that in artificial intelligence, especially in the field of natural language processing, large-scale pre-trained models have greatly promoted the development and progress of related technologies. In the practical application of natural language processing, dialogue has become an important scenario. Currently, state machine-based dialogue flow models are the mainstream solution for chatbots. Constructing a dialogue flow model typically requires manually reviewing a large amount of dialogue logs, identifying key nodes, and then building the dialogue flow model structure. This entire process is labor-intensive and inefficient. The method for processing dialogue data provided in the embodiments of this specification can effectively solve this problem.

[0045] like Figure 2As shown, a user sends dialogue data containing multiple rounds of dialogue to a server 220 via a client 210 used to build a dialogue flow model. After acquiring the dialogue data, the server 220 extracts the semantic vectors of each of the multiple rounds of dialogue, performs clustering calculations using these semantic vectors, obtains clustering results, determines key information corresponding to each category based on the clustering results, calculates the vector distance between each round of dialogue in each category and the centroid of that category, and selects the round of dialogue closest to the centroid from each category as the key dialogue, which is then sent to the client 210. The client 210 receives the key dialogue feedback from the server regarding the dialogue data, allowing the user to use the key dialogue as dialogue for dialogue nodes in the dialogue flow model to be built, either on demand or automatically by the client. Therefore, in this application scenario, the method for processing dialogue data provided in the embodiments of this specification achieves automated mining of key dialogue nodes, i.e., key dialogue. Specifically, for example, a large number of dialogue logs can be vectorized and encoded using a pre-trained dialogue model, and key dialogue nodes can be automatically extracted using clustering methods. This lays the foundation for the automatic construction of subsequent dialogue flow models, thereby greatly saving manpower and playing an important role in the rapid construction and deployment of chatbots.

[0046] Next, taking downstream tasks as an example for extracting topics for dialogue, combined with... Figure 3The schematic diagram of another embodiment of this specification illustrates this application scenario. It is understood that extracting dialogue topics can assist in completing other downstream tasks such as dialogue topic distribution statistics, dialogue keyword extraction, dialogue structure learning, and dialogue summarization. Therefore, dialogue topic extraction is of great significance in alleviating the difficulty of downstream tasks and improving their effectiveness. However, industry research on this task remains focused on methods represented by supervised learning, resulting in a lack of general methods that can be transferred to downstream applications. Supervised learning methods rely on manual annotation to label the topic of each round of speech within a conversation. Then, the text of each round of speech and its corresponding label are input into a supervised learning model to optimize the cross-entropy loss between the model's output and the labeled labels. Topic labels are generally manually sorted and gradually expanded during the annotation process. The modeling model can be any neural network model capable of modeling multi-turn dialogues, such as convolutional neural networks, recurrent neural networks, or conversational pre-trained language models. However, in order to label topics, sorting out topics requires a lot of manpower and relies on many heuristic rules such as the BM25 algorithm. Moreover, due to the semantic relationship of the context, the labeling of multi-turn dialogues is much more difficult than that of single text, and it is difficult to complete the construction of a large-scale training set through manual labeling. Therefore, the embodiments of this specification provide a self-supervised learning method based on dialogue turn clustering to process dialogue data.

[0047] like Figure 3 As shown, dialogue data containing multiple rounds of dialogue can be input into a semantic vector extraction model to extract the semantic vectors of each round of dialogue. These semantic vectors are then input into a clustering algorithm. Based on the clustering results, the cluster label of each category is used as the topic for each round of dialogue within that category. Furthermore, after obtaining the topics for each round of dialogue, the dialogue can be segmented according to the topics to obtain dialogue sub-segments corresponding to each topic. These sub-segments can then be sent to downstream tasks for rapid execution. It should be noted that segmenting the dialogue based on topics is an optional step; alternatively, the key information corresponding to each round of dialogue can be directly sent as topics to other downstream tasks. This specification does not impose any restrictions on this approach.

[0048] The methods, apparatus, systems, computing devices, and computer-readable storage media for processing dialogue data provided in this specification will now be described in detail in the following embodiments.

[0049] See Figure 4 , Figure 4A flowchart is shown of a method for processing dialogue data according to an embodiment of this specification, specifically including the following steps.

[0050] Step 402: Obtain dialogue data, wherein the dialogue data includes multiple rounds of dialogue.

[0051] The dialogue data refers to dialogue data in any dialogue scenario, such as between people or between people and machines. The dialogue data can include one or more dialogues. For example, a single dialogue can consist of multiple rounds of conversation between participants in an online meeting. Another example is a dialogue between a user and an intelligent customer service representative, where multiple rounds can include the user's responses and the system's replies. Yet another example is a dialogue between two users in an online chat. Each time a participant speaks in the dialogue data corresponds to one round of conversation. For example, as... Figure 6 In the processing scenario shown, the dialogue data includes 20 rounds of dialogue.

[0052] Step 404: Extract the semantic vectors of each of the multiple rounds of dialogue.

[0053] To more accurately extract semantic vectors from each round of dialogue, the embodiments of this specification can also perform data cleaning on the dialogue dataset. Specifically, for example, entities such as names and addresses can be normalized, stop words can be removed, and consecutive statements can be merged. After cleaning, the dialogue data can then be further processed for extracting semantic vectors.

[0054] The embodiments in this specification are not limited to the implementation methods for extracting semantic vectors from multiple rounds of dialogue. For example, dialogue data can be input into a semantic vector extraction model to extract semantic vectors. The semantic vector extraction model can be any model used for modeling natural language processing tasks, including but not limited to convolutional neural networks, recurrent neural networks, BERT models, and Transformer-based pre-trained conversational language models.

[0055] In one or more embodiments of this specification, the rounds in the dialogue data can be identified so that the semantic vector extraction model can distinguish the rounds based on these identifiers, thereby extracting semantic vectors more accurately. Specifically, extracting the semantic vector of each round of dialogue in the dialogue data includes:

[0056] For each dialogue, the multiple rounds of dialogue in that dialogue are concatenated to obtain the dialogue sequence of that dialogue.

[0057] Add a sequence identifier to the beginning of each dialogue sequence to distinguish the sequence, and add a corresponding round identifier to each round of dialogue to distinguish the round of dialogue;

[0058] The dialogue sequence is input into the semantic vector extraction model to extract the semantic vectors of each of the multiple rounds of dialogue.

[0059] For example, dialogue data can include one or more dialogues. For each dialogue, the turns within the entire dialogue can be concatenated. After concatenation, a prefix such as "[CLS]" is added to the beginning of the sequence as a sequence identifier, and a prefix such as "[START]" is added to the beginning of each turn as a turn identifier. After processing the entire sequence, the dialogue sequence can be input into a pre-trained conversational language model, and the turn-by-turn dialogues are distinguished according to the turn identifier "[START]". The vector of each turn is extracted, which is the semantic representation corresponding to each turn in the dialogue.

[0060] It should be noted that the specific implementation of the semantic vector extraction model provided in the embodiments of this specification is not limited. For example, in one or more embodiments, the semantic vector extraction model can be a multi-layer Transformer model; wherein, the last layer of Transformer is used to predict the semantic vectors of each of the multiple rounds of dialogue according to an autoregressive method, wherein the autoregressive method refers to predicting the semantic vectors of the subsequent rounds of dialogue from the semantic vectors of the earlier rounds of dialogue in a dialogue.

[0061] Specifically, for example, the semantic vector extraction model in practical applications can be expressed as follows: Figure 5 The pre-trained dialogue model shown. Figure 5 As shown, the structure of the pre-trained dialogue model includes 1 to N Transformer layers and a context Transformer layer, where N is an integer greater than or equal to 2. A single input to the pre-trained dialogue model can be an input sequence formed by multiple turns of dialogue within a single conversation. After passing through a character encoding layer, a character position encoding layer, and a dialogue role encoding layer, the input sequence enters a multi-layer Transformer model, with the last layer being the context Transformer layer. In the 1 to N Transformer layers, the semantic vector output from one layer serves as the input to the next layer, thus progressively improving the accuracy of the semantic vectors. In the context Transformer layer, semantic vectors are predicted autoregressively. For example, for multiple turns of dialogue C1, C2, C3, C2 is predicted from C1, and C3 is predicted from C2, following the dialogue order.

[0062] To further improve the accuracy of the semantic vector extraction model, in one or more embodiments of this specification, additional labeled round-turn dialogue samples can be added to each input sequence, and training can be performed with these labels as the target. Specifically, the method may further include: before training the semantic vector extraction model, for each dialogue sample set, concatenating multiple round-turn dialogue samples from the dialogue sample set of that dialogue and the additionally added round-turn dialogue samples to obtain a dialogue sequence sample for that dialogue; inputting the dialogue sequence sample into the semantic vector extraction model for training to obtain the trained semantic vector extraction model. During training, the last Transformer layer predicts the semantic vectors of each round-turn dialogue using the autoregressive method, and the semantic vector extraction model adjusts its parameters based on the prediction results of the additionally added round-turn dialogues.

[0063] In the above embodiments, for example, the prediction result of the additional dialogue rounds can be whether or not an additional dialogue round is included. The model can have two training objectives: one is to train the vectors of each dialogue round using 1 to N Transformer layers through autoregression; the other is for the last Transformer layer to predict whether a specified round in the input, such as the last round, is an additional dialogue round. The additional dialogue rounds can be real dialogue rounds or randomly generated rounds. Finally, the pre-trained dialogue model outputs each dialogue round using context-based vectorization encoding.

[0064] Step 406: Perform clustering calculation using the semantic vectors of the multiple rounds of dialogue to obtain the clustering results.

[0065] The clustering algorithm used in the clustering calculation is not limited. For example, it can be the KMeans clustering algorithm, the expectation-maximum clustering algorithm of Gaussian mixture model, the data density-based clustering algorithm, etc.

[0066] Taking the KMeans clustering algorithm as an example, after the semantic vectors of multiple rounds of dialogue are extracted, they can be stored in a unified feature pool. The KMeans clustering algorithm is then used to cluster the semantic vectors in the feature pool. The anchor points for the KMeans clustering algorithm can be selected randomly or initialized using KMeans++. The number of clusters can be any value from 1 to M (where M represents the total number of messages). To obtain better clustering results, the number of clusters can be selected based on the scenario requirements, such as an empirical value of 100 to 1000. Once the clustering converges, the clustering results are obtained. For example, the cluster labels for each category can be saved for later use.

[0067] Specifically, for example, the clustering calculation using the semantic vectors of the multiple rounds of dialogue to obtain the clustering result includes:

[0068] The semantic vectors from multiple rounds of dialogue are stored in the feature pool;

[0069] Use the feature pool as the initial segmentation range;

[0070] Randomly select one round of dialogue within the segmentation range as the segmentation point;

[0071] Calculate the difference between the forward aggregation feature and the backward aggregation feature of the split point, and select the position with the largest difference as the optimized split point of the split range;

[0072] The rounds of dialogue within the segmentation range are divided into two parts using the optimized segmentation point;

[0073] The two rounds of dialogue that were split into two parts are used as the updated split ranges respectively;

[0074] For a segmentation range where the number of rounds of dialogue does not meet the preset dialogue number requirement, the segmentation range is changed to a completed segmentation.

[0075] If the number of optimized segmentation points does not meet the preset number of segmentation points, for the updated segmentation range, return to the step of randomly selecting a round of dialogue as the segmentation point within the segmentation range;

[0076] If the number of optimized split points reaches the preset requirement, all optimized split points are used as initialization points to initialize KMeans cluster centers (that is, distance difference points are selected from high to low as initialization centers), and the optimized split points are adjusted by the KMeans clustering algorithm and the clustering results are determined based on the adjusted optimized split points.

[0077] Among them, the adjusted optimized segmentation points are the center points of the corresponding categories.

[0078] Taking a data density-based clustering algorithm as an example, the step of using the semantic vectors of the multiple rounds of dialogue to perform clustering calculations and obtain clustering results can include: using the semantic vectors of the multiple rounds of dialogue to perform clustering calculations based on a data density-based clustering algorithm to obtain clustering results. By using a density-based clustering algorithm, such as the HDBSCAN algorithm, it is not necessary to specify the number of categories in advance, resulting in better clustering performance.

[0079] To further improve clustering accuracy, in one or more embodiments of this specification, multiple clustering calculations can be performed with different clustering accuracies, thereby removing noisy data points in the data while gradually improving accuracy. For example, taking two clustering calculations as an example, the clustering calculation based on data density using the semantic vectors of the multiple rounds of dialogue can include:

[0080] The clustering parameters of the data density-based clustering algorithm are set based on the first clustering accuracy, and the semantic vectors of the multiple rounds of dialogue are used to perform clustering calculations to obtain the first clustering result.

[0081] Based on the first clustering result, noise points in the multiple rounds of dialogue are removed to obtain updated multiple rounds of dialogue;

[0082] The clustering parameters of the data density-based clustering algorithm are set based on the second clustering accuracy, and clustering calculation is performed using the updated semantic vectors of multiple rounds of dialogue to obtain the second clustering result, wherein the second clustering accuracy is greater than the first clustering accuracy.

[0083] In the above embodiment, it is equivalent to performing coarse clustering by controlling parameters during the first clustering calculation in order to remove noisy data points in the data, and performing fine clustering on the data filtered by the first clustering during the second clustering calculation by controlling parameters to obtain the clustering result.

[0084] The clustering results can be represented as multiple categories, each with its own centroid.

[0085] Step 408: Based on the clustering results, determine the key information corresponding to each category.

[0086] The key information is related to the category, but the specific content is not limited. The method of obtaining it can be set as needed in different application scenarios.

[0087] For example, in one or more embodiments of this specification, the key information is a topic, and determining the key information corresponding to each category based on the clustering results includes:

[0088] Based on the clustering results, the clustering label of each category is used as the topic for each round of dialogue in that category.

[0089] For example, the dialogue data can be segmented by utilizing the topics corresponding to each of the multiple rounds of dialogue.

[0090] It is understandable that obtaining the topics of multiple rounds of dialogue is equivalent to obtaining the correspondence between the rounds of dialogue and the topics. Some rounds of dialogue may have the same topic, some may have different topics, and some may have topics that appear so infrequently that they are difficult to use as topics in practical applications. Therefore, to perform tasks such as topic segmentation more accurately later, one or more embodiments of this specification may further filter and integrate the topics of multiple rounds of dialogue. Specifically, for example, the method may further include:

[0091] Determine whether the topics of each of the multiple rounds of dialogue meet preset requirements;

[0092] Determine whether the topics of each of the multiple rounds of dialogue meet preset requirements;

[0093] From the topics of the multiple rounds of dialogue, delete topics that do not meet the preset requirements;

[0094] For each round of dialogue where a topic has been deleted, calculate the distance between the semantic vector of that round of dialogue and the semantic vector of other rounds of dialogue.

[0095] Based on the calculated distance, select topics from other rounds of dialogue as the topic for this round of dialogue.

[0096] The preset requirements can be set according to the actual application scenario. For example, in some application scenarios, the preset requirements may include a topic blacklist or whitelist preset based on experience, and the satisfaction of the preset requirements is determined based on the topic blacklist or whitelist. As another example, in some application scenarios, the preset requirements may include a preset frequency threshold range, and the satisfaction of the preset requirements is determined based on the frequency of topic occurrence.

[0097] Specifically, for example, determining whether the topics of each of the multiple rounds of dialogue meet preset requirements may include:

[0098] The recurring topics in multiple rounds of dialogue within each conversation are counted to obtain the number of times each topic is repeated in that conversation.

[0099] Topics that do not appear more than the preset number of times are identified as topics that do not meet the preset requirements.

[0100] by Figure 6Taking the example of the dialogue topic segmentation application scenario diagram, assume that dialogue rounds 1 to 20 constitute a single dialogue. "Topic 1" appears 6 times, "Topic 2" appears 5 times, "Topic 3" appears 6 times, and "Topic 4" appears 3 times. Since the frequency of "Topic 4" is less than the preset repetition frequency range of "greater than or equal to 5", "Topic 4" is deleted from dialogue rounds 10 to 12. Then, the topic is modified based on the vector distance between dialogue rounds 10 to 12 and other dialogue rounds in the entire dialogue, for example, changing it to "Topic 1".

[0101] For example, determining whether the topics of each of the multiple rounds of dialogue meet preset requirements may include:

[0102] The number of times each topic is repeated in a conversation is obtained by counting the topics that appear consecutively in multiple rounds of conversation within each conversation.

[0103] Topics that do not meet the preset number of consecutive repetitions are identified as topics that do not meet the preset requirements.

[0104] by Figure 6 Taking the example of the application scenario diagram of dialogue topic segmentation, the consecutive repetition count of "Topic 1" is 3, the consecutive repetition counts of "Topic 2" are 2 and 1, the consecutive repetition counts of "Topic 3" are 3, and the consecutive repetition count of "Topic 4" is 3. It can be seen that the consecutive repetition count of "Topic 2" does not reach the preset consecutive repetition count range of "greater than or equal to 3". Therefore, "Topic 2" is deleted. Then, the topic is modified according to the vector distance between "Round Dialogue 4", "Round Dialogue 5", "Round Dialogue 9", "Round Dialogue 13", "Round Dialogue 20" and other round dialogues in the whole dialogue. For example, it is changed to "Topic 3".

[0105] For example, in a scenario combining the two implementation methods described above, topics that do not meet the required number of repetitions can be discarded first, based on a preset range of repetitions, and then topics that do not meet the required number of consecutive repetitions can be discarded based on a preset range of consecutive repetitions. For instance, after discarding and modifying topics according to the above example, the topics for rounds 1 to 3 are "Topic 1", the topics for rounds 4 to 9 are "Topic 3", the topics for rounds 10 to 12 are "Topic 1", the topics for round 13 are "Topic 3", the topics for rounds 14 to 16 are "Topic 1", and the topics for rounds 17 to 20 are "Topic 3".

[0106] In the above embodiments, for example, for each dialogue, the topics corresponding to the dialogue rounds in each dialogue can be analyzed, and then for each dialogue, a judgment can be made to discard topics whose recurrence frequency is less than a certain threshold, and / or, topics whose consecutive recurrence frequency is less than a certain threshold can be discarded, and after discarding, the topic at that position can be replaced by the adjacent context topic according to the vector distance, so that the extracted topics are more reasonable.

[0107] This method acquires dialogue data, which includes multiple rounds of dialogue. It extracts semantic vectors from each of these rounds, performs clustering calculations using these semantic vectors, and determines the key information for each category based on the clustering results. Therefore, this method extracts semantic vectors at the dialogue round level and applies unsupervised clustering to extract key information from the dialogue. It fully leverages the advantages of unsupervised, especially self-supervised, learning, alleviating the demands of supervised learning on data volume and annotation. This allows dialogue data processing to be practically applied to large-scale dialogue data, enabling tasks such as dialogue topic extraction and dialogue flow model construction to be implemented in real-world scenarios.

[0108] The following is in conjunction with the appendix Figure 6 Taking the method for processing dialogue data provided in this specification in the application scenario of dialogue topic segmentation as an example, the method for processing dialogue data will be further explained. Among them, Figure 6 A flowchart illustrating a method for processing dialogue data according to an embodiment of this specification is shown, specifically including the following steps.

[0109] Step 602: Obtain dialogue data, wherein the dialogue data includes multiple rounds of dialogue.

[0110] Step 604: Extract the semantic vectors of each of the multiple rounds of dialogue using a semantic vector extraction model.

[0111] Step 606: Perform clustering calculations using the semantic vectors of each of the multiple rounds of dialogue to obtain the clustering results.

[0112] Step 608: Based on the clustering results, use the clustering label of each category as the topic corresponding to each round of dialogue in that category.

[0113] Step 610: Use the topics corresponding to each of the multiple rounds of dialogue to segment the dialogue data into topics.

[0114] In this embodiment, by performing topic segmentation, the multiple stages of the dialogue are divided into sub-segments according to their topics, which can more effectively assist in completing downstream tasks such as dialogue topic distribution statistics, dialogue keyword extraction, dialogue structure learning, and dialogue summarization, thereby alleviating the difficulty of downstream tasks and improving their effectiveness.

[0115] The following is in conjunction with the appendix Figure 7 Taking the method for processing dialogue data provided in this specification in the application scenario of dialogue flow model construction as an example, the method for processing dialogue data will be further explained. Figure 7 A flowchart illustrating a method for processing dialogue data according to an embodiment of this specification is shown, specifically including the following steps.

[0116] Step 702: Obtain dialogue data, wherein the dialogue data includes multiple rounds of dialogue.

[0117] Step 704: Extract the semantic vectors of each of the multiple rounds of dialogue using a semantic vector extraction model.

[0118] Step 706: Perform clustering calculations using the semantic vectors of each of the multiple rounds of dialogue to obtain the clustering results.

[0119] Step 708: Based on the clustering results, calculate the vector distance between each round of dialogue in each category and the center point of that category, and select the round of dialogue closest to the center point from each category as the key dialogue based on the vector distance.

[0120] Additionally, the key phrases can be pushed to the dialogue construction module, so that the dialogue construction module can use the key phrases as the phrases for dialogue nodes in the dialogue flow model to be constructed. The dialogue construction module can be located on the client side or on the server side where the method for processing dialogue data is applied; this embodiment of the specification does not impose any limitations on this.

[0121] Corresponding to the above method embodiments, this specification also provides embodiments of an apparatus for processing dialogue data. Figure 8 A schematic diagram of a device for processing dialogue data according to one embodiment of this specification is shown. Figure 8 As shown, the device includes:

[0122] The dialogue acquisition module 802 can be configured to acquire dialogue data, wherein the dialogue data includes multiple rounds of dialogue.

[0123] The feature extraction module 804 can be configured to extract the semantic vectors of each of the multiple rounds of dialogue.

[0124] The clustering calculation module 806 can be configured to perform clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain clustering results.

[0125] The information determination module 808 can be configured to determine the key information corresponding to each category based on the clustering results.

[0126] Because this device acquires dialogue data, which includes multiple rounds of dialogue, extracts semantic vectors from each of these rounds, performs clustering calculations using these semantic vectors, and determines the key information corresponding to each category based on the clustering results, this device extracts semantic vectors at the dialogue round level and applies unsupervised clustering methods to extract key information from the dialogue. This fully leverages the advantages of unsupervised, especially self-supervised, learning, alleviating the demands of supervised learning on data volume and annotation. This allows tasks such as dialogue topic extraction and dialogue flow model construction to be practically applied to large-scale dialogue data and implemented in real-world scenarios.

[0127] In one or more embodiments of this specification, the information determination module 808 can be configured to use the clustering label of each category as the topic corresponding to each round of dialogue in that category, based on the clustering results.

[0128] In one or more embodiments of this specification, the information determination module 808 may include:

[0129] The distance calculation submodule can be configured to calculate the vector distance between each round of dialogue in each category and the centroid of that category, based on the clustering results.

[0130] The dialogue mining submodule can be configured to select the round of dialogue closest to the center point from each category as the key dialogue based on the vector distance.

[0131] In one or more embodiments of this specification, the apparatus may further include: a script push module, which can be configured to push the key script to the dialogue construction module, so that the dialogue construction module uses the key script as the script of the dialogue node in the dialogue flow model to be constructed.

[0132] In one or more embodiments of this specification, the feature extraction module 804 may include:

[0133] The dialogue splicing submodule can be configured to splice multiple rounds of dialogue in each dialogue to obtain the dialogue sequence of that dialogue.

[0134] The identifier addition submodule can be configured to add a sequence identifier to the beginning of each dialogue sequence to distinguish the sequence, and to add a corresponding round identifier to each round of dialogue to distinguish the round of dialogue.

[0135] The feature extraction submodule can be configured to input the dialogue sequence into the semantic vector extraction model to extract the semantic vectors of each of the multiple rounds of dialogue.

[0136] In one or more embodiments of this specification, the clustering calculation module 806 may include:

[0137] The feature setting submodule can be configured to store semantic vectors from multiple rounds of dialogue into a feature pool;

[0138] The range initialization submodule can be configured to use the feature pool as the initial segmentation range;

[0139] The segmentation selection submodule can be configured to randomly select a round of dialogue as the segmentation point within the segmentation range;

[0140] The optimization point selection submodule can be configured to calculate the difference between the forward aggregation feature and the backward aggregation feature of the split point, and select the position with the largest difference as the optimized split point of the split range;

[0141] The segmentation execution submodule can be configured to divide the rounds of dialogue within the segmentation range into two parts using the optimized segmentation point;

[0142] The range update submodule can be configured to use the two parts of the split dialogue as the update range respectively;

[0143] The granularity control submodule can be configured to convert a segmentation range that does not meet the preset number of dialogues in a round into a segmented portion;

[0144] The module for continuing to segment can be configured to, if the number of optimized segmentation points does not meet the preset number of segmentation points, trigger the segmentation selection submodule to return to the step of randomly selecting a round of dialogue as the segmentation point within the updated segmentation range for the updated segmentation range.

[0145] The clustering adjustment submodule can be configured to, if the number of optimized split points reaches a preset requirement, obtain all optimized split points as initialization points to initialize KMeans cluster centers, adjust the optimized split points through the KMeans clustering algorithm, and determine the clustering result based on the adjusted optimized split points.

[0146] In one or more embodiments of this specification, the apparatus may further include:

[0147] The judgment module can be configured to determine whether the topics of each of the multiple rounds of dialogue meet preset requirements.

[0148] The key information deletion module can be configured to delete topics that do not meet the preset requirements from the topics of the multiple rounds of dialogue.

[0149] The feature distance calculation module can be configured to calculate the distance between the semantic vector of a round of dialogue for a deleted topic and the semantic vector of other rounds of dialogue.

[0150] The key information substitution module can be configured to select topics from other rounds of dialogue as the topic for this round of dialogue based on the calculated distance.

[0151] In one or more embodiments of this specification, the judgment module can be configured to count the repeated topics in multiple rounds of dialogue in each dialogue, obtain the number of times each topic is repeated in the dialogue, and determine the topics whose number of repeated occurrences does not reach the preset number of repeated occurrences as topics that do not meet the preset requirements.

[0152] In one or more embodiments of this specification, the judgment module can be configured to count the topics that appear consecutively in multiple rounds of dialogue in each dialogue, obtain the number of consecutive repetitions of each topic in the dialogue, and determine the topics whose consecutive repetition count does not reach the preset range of consecutive repetition count as topics that do not meet the preset requirements.

[0153] In one or more embodiments of this specification, the semantic vector extraction model is a multi-layer Transformer model; wherein, the last layer Transformer is used to predict the semantic vectors of the multiple rounds of dialogue according to an autoregressive method, wherein the autoregressive method refers to predicting the semantic vectors of the subsequent rounds of dialogue according to the dialogue order of the multiple rounds of dialogue in a dialogue, based on the semantic vectors of the earlier rounds of dialogue.

[0154] Accordingly, the device may further include:

[0155] The sample acquisition module can be configured to, before training the semantic vector extraction model, concatenate multiple rounds of dialogue samples and additional rounds of dialogue samples in the dialogue sample set of each dialogue to obtain the dialogue sequence sample of that dialogue.

[0156] The model training module can be configured to input the dialogue sequence samples into the semantic vector extraction model for training, thereby obtaining the trained semantic vector extraction model; wherein, during the training of the semantic vector extraction model, the last layer Transformer predicts the semantic vectors of each round of dialogue through the autoregressive method, and the semantic vector extraction model adjusts the model parameters based on the prediction results of additional rounds of dialogue.

[0157] In one or more embodiments of this specification, the feature extraction module 804 can be configured to use the semantic vectors of the multiple rounds of dialogue to perform clustering calculations based on a data density clustering algorithm to obtain clustering results.

[0158] For example, the feature extraction module 804 may include:

[0159] The first clustering submodule can be configured to set the clustering parameters of the data density-based clustering algorithm based on the first clustering accuracy, and to perform clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain the first clustering result.

[0160] The noise reduction submodule can be configured to remove noise points from the multiple rounds of dialogue based on the first clustering result, thereby obtaining updated multiple rounds of dialogue.

[0161] The second clustering submodule can be configured to set the clustering parameters of the data density-based clustering algorithm based on the second clustering accuracy, and perform clustering calculations using the updated semantic vectors of multiple rounds of dialogue to obtain a second clustering result, wherein the second clustering accuracy is greater than the first clustering accuracy.

[0162] The above is an illustrative scheme of an apparatus for processing dialogue data according to this embodiment. It should be noted that the technical solution of this apparatus for processing dialogue data and the technical solution of the method for processing dialogue data described above belong to the same concept. For details not described in detail in the technical solution of the apparatus for processing dialogue data, please refer to the description of the technical solution of the method for processing dialogue data described above.

[0163] Corresponding to the above method embodiments, this specification also provides a system embodiment for processing dialogue data, the structure of which can be found in [reference needed]. Figure 1 The illustration shows an application scenario. Figure 1 As shown, the system may include:

[0164] The client is configured to send dialogue data to the server, wherein the dialogue data includes multiple rounds of dialogue, and to receive key dialogues from the server in response to the dialogue data, and to use the key dialogues as dialogue nodes in the dialogue flow model to be constructed.

[0165] The server is configured to acquire dialogue data, extract the semantic vectors of each of the multiple rounds of dialogue, perform clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain clustering results, calculate the vector distance between the rounds of dialogue in each category and the centroid of that category based on the clustering results, select the round of dialogue closest to the centroid from each category as the key dialogue based on the vector distance, and send the key dialogue to the client.

[0166] The above is an illustrative scheme of a system for processing dialogue data according to this embodiment. It should be noted that the technical solution of this system for processing dialogue data and the technical solution of the method for processing dialogue data described above belong to the same concept. For details not described in detail in the technical solution of the system for processing dialogue data, please refer to the description of the technical solution of the method for processing dialogue data described above.

[0167] Figure 9 A structural block diagram of a computing device 900 according to one embodiment of this specification is shown. The components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is connected to the memory 910 via a bus 930, and a database 950 is used to store data.

[0168] The computing device 900 also includes an access device 940, which enables the computing device 900 to communicate via one or more networks 960. Examples of these networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the Internet. The access device 940 may include one or more of any type of wired or wireless network interface (e.g., a Network Interface Card (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) interface, a Wi-MAX interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, a Near Field Communication (NFC) interface, and so on.

[0169] In one embodiment of this specification, the aforementioned components of the computing device 900 and Figure 9 Other components, not shown, can also be connected to each other, for example, via a bus. It should be understood that... Figure 9 The block diagram of the computing device shown is for illustrative purposes only and is not intended to limit the scope of this specification. Those skilled in the art can add or replace other components as needed.

[0170] The computing device 900 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable computing devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. The computing device 900 can also be a mobile or stationary server.

[0171] The processor 920 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the method for processing dialogue data described above. For example, these include:

[0172] Acquire dialogue data, wherein the dialogue data includes multiple rounds of dialogue;

[0173] Extract the semantic vectors of each of the multiple rounds of dialogue;

[0174] Clustering calculations are performed using the semantic vectors from the multiple rounds of dialogue to obtain the clustering results;

[0175] Based on the clustering results, the key information corresponding to each category is determined.

[0176] The above is an illustrative scheme of a computing device according to this embodiment. It should be noted that the technical solution of this computing device and the technical solution of the method for processing dialogue data described above belong to the same concept. For details not described in detail in the technical solution of the computing device, please refer to the description of the technical solution of the method for processing dialogue data described above.

[0177] An embodiment of this specification also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method for processing dialogue data described above. For example, it includes:

[0178] Acquire dialogue data, wherein the dialogue data includes multiple rounds of dialogue;

[0179] Extract the semantic vectors of each of the multiple rounds of dialogue;

[0180] Clustering calculations are performed using the semantic vectors from the multiple rounds of dialogue to obtain the clustering results;

[0181] Based on the clustering results, the key information corresponding to each category is determined.

[0182] The above is an illustrative embodiment of a computer-readable storage medium according to this invention. It should be noted that the technical solution of this storage medium and the technical solution of the method for processing dialogue data described above belong to the same concept. Details not described in detail in the technical solution of the storage medium can be found in the description of the technical solution of the method for processing dialogue data described above.

[0183] This specification also provides a computer program in one embodiment, wherein when the computer program is executed in a computer, it causes the computer to perform the steps of the method for processing dialogue data described above. For example, it includes:

[0184] Acquire dialogue data, wherein the dialogue data includes multiple rounds of dialogue;

[0185] Extract the semantic vectors of each of the multiple rounds of dialogue;

[0186] Clustering calculations are performed using the semantic vectors from the multiple rounds of dialogue to obtain the clustering results;

[0187] Based on the clustering results, the key information corresponding to each category is determined.

[0188] The above is an illustrative example of a computer program according to this embodiment. It should be noted that the technical solution of this computer program and the technical solution of the method for processing dialogue data described above belong to the same concept. Details not described in detail in the technical solution of the computer program can be found in the description of the technical solution of the method for processing dialogue data described above.

[0189] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0190] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0191] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments in this specification are not limited to the described order of actions, because according to the embodiments in this specification, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in this specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments in this specification.

[0192] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0193] The preferred embodiments disclosed above are merely illustrative of this specification. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments described herein. These embodiments are selected and specifically described in this specification to better explain the principles and practical applications of the embodiments, thereby enabling those skilled in the art to better understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method for processing dialogue data, comprising: Acquire dialogue data, wherein the dialogue data includes multiple rounds of dialogue; Extract the semantic vectors of each of the multiple rounds of dialogue; Clustering calculations are performed using the semantic vectors from the multiple rounds of dialogue to obtain the clustering results; Based on the clustering results, key information corresponding to each category is determined, wherein the key information is a topic; Determine whether the topics of each of the multiple rounds of dialogue meet the preset requirements. Delete topics that do not meet the preset requirements from the topics of the multiple rounds of dialogue. For the rounds of dialogue with deleted topics, calculate the distance between the semantic vector of the round of dialogue and the semantic vector of other rounds of dialogue. Select the topics of other rounds of dialogue as the topic of the round of dialogue based on the calculated distance. The step of determining whether the topics of the multiple rounds of dialogue meet the preset requirements includes: counting the topics that appear repeatedly in the multiple rounds of dialogue in each dialogue, obtaining the number of times each topic appears repeatedly in the dialogue, and determining topics whose number of repetitions does not reach the preset number of repetitions as topics that do not meet the preset requirements.

2. The method according to claim 1, wherein determining the key information corresponding to each category based on the clustering results includes: Based on the clustering results, the clustering label of each category is used as the topic for each round of dialogue in that category.

3. The method according to claim 1, wherein the key information is key dialogue, and determining the key information corresponding to each category based on the clustering results includes: Based on the clustering results, calculate the vector distance between each round of dialogue in each category and the centroid of that category; Based on the vector distance, the round of dialogue closest to the center point is selected from each category as the key dialogue.

4. The method according to claim 3, further comprising: The key phrases are pushed to the dialogue construction module so that the dialogue construction module can use the key phrases as the phrases for dialogue nodes in the dialogue flow model to be built.

5. The method according to claim 1, wherein extracting the semantic vector of each round of dialogue in the dialogue data comprises: For each dialogue, the multiple rounds of dialogue in that dialogue are concatenated to obtain the dialogue sequence of that dialogue. Add a sequence identifier to the beginning of each dialogue sequence to distinguish the sequence, and add a corresponding round identifier to each round of dialogue to distinguish the round of dialogue; The dialogue sequence is input into the semantic vector extraction model to extract the semantic vectors of each of the multiple rounds of dialogue.

6. The method according to claim 1, wherein the step of performing clustering calculation using the semantic vectors of the plurality of rounds of dialogue to obtain clustering results includes: The semantic vectors from multiple rounds of dialogue are stored in the feature pool; Use the feature pool as the initial segmentation range; Randomly select one round of dialogue within the segmentation range as the segmentation point; Calculate the difference between the forward aggregation feature and the backward aggregation feature of the split point, and select the position with the largest difference as the optimized split point of the split range; The rounds of dialogue within the segmentation range are divided into two parts using the optimized segmentation point; The two rounds of dialogue that were split into two parts are used as the updated split ranges respectively; For a segmentation range where the number of rounds of dialogue does not meet the preset dialogue number requirement, the segmentation range is changed to a completed segmentation. If the number of optimized segmentation points does not meet the preset number of segmentation points, for the updated segmentation range, return to the step of randomly selecting a round of dialogue as the segmentation point within the segmentation range; If the number of optimized split points reaches the preset requirement, all optimized split points are used as initialization points to initialize KMeans cluster centers. The optimized split points are then adjusted using the KMeans clustering algorithm, and the clustering results are determined based on the adjusted optimized split points.

7. The method according to claim 1, wherein determining whether the topics of each of the plurality of rounds of dialogue meet preset requirements further includes: The number of times each topic is repeated in a conversation is obtained by counting the topics that appear consecutively in multiple rounds of conversation within each conversation. Topics that do not meet the preset number of consecutive repetitions are identified as topics that do not meet the preset requirements.

8. The method according to claim 5, wherein the semantic vector extraction model is a multi-layer Transformer model; in, The final Transformer layer is used to predict the semantic vectors of the multiple rounds of dialogue according to an autoregressive method. The autoregressive method refers to predicting the semantic vectors of the subsequent rounds of dialogue from the semantic vectors of the earlier rounds of dialogue in a single dialogue.

9. The method according to claim 8, further comprising: Before training the semantic vector extraction model, for each dialogue sample set, multiple rounds of dialogue samples in the dialogue sample set of that dialogue, as well as additional rounds of dialogue samples, are concatenated to obtain the dialogue sequence sample of that dialogue. The dialogue sequence sample is input into the semantic vector extraction model for training to obtain the trained semantic vector extraction model. During training, the semantic vector extraction model uses the last Transformer layer to predict the semantic vectors of each round of dialogue via autoregression, and adjusts the model parameters based on the prediction results of additional rounds of dialogue.

10. The method according to claim 1, wherein the step of performing clustering calculation using the semantic vectors of the plurality of rounds of dialogue to obtain clustering results includes: Using the semantic vectors from the multiple rounds of dialogue, a clustering algorithm based on data density is used to perform clustering calculations and obtain the clustering results.

11. The method according to claim 10, wherein the clustering calculation using the semantic vectors of the plurality of rounds of dialogue and a clustering algorithm based on data density includes: The clustering parameters of the data density-based clustering algorithm are set based on the first clustering accuracy, and the semantic vectors of the multiple rounds of dialogue are used to perform clustering calculations to obtain the first clustering result. Based on the first clustering result, noise points in the multiple rounds of dialogue are removed to obtain updated multiple rounds of dialogue; The clustering parameters of the data density-based clustering algorithm are set based on the second clustering accuracy, and clustering calculation is performed using the updated semantic vectors of multiple rounds of dialogue to obtain the second clustering result, wherein the second clustering accuracy is greater than the first clustering accuracy.

12. A system for processing dialogue data, comprising: The client is configured to send dialogue data to the server, wherein the dialogue data includes multiple rounds of dialogue, and to receive key dialogues from the server in response to the dialogue data, and to use the key dialogues as dialogue nodes in the dialogue flow model to be constructed. The server is configured to acquire dialogue data, extract the semantic vectors of each of the multiple rounds of dialogue, perform clustering calculations using the semantic vectors of the multiple rounds of dialogue to obtain clustering results, calculate the vector distance between the rounds of dialogue in each category and the centroid of that category based on the clustering results, select the round of dialogue closest to the centroid from each category as the key dialogue based on the vector distance, and send the key dialogue to the client. The server is further configured to determine whether the topics of each of the multiple rounds of dialogue meet preset requirements, delete topics that do not meet the preset requirements from the topics of the multiple rounds of dialogue, calculate the distance between the semantic vector of the round of dialogue and the semantic vector of other rounds of dialogue for the round of dialogue with the deleted topic, and select the topic of other rounds of dialogue as the topic of the round of dialogue based on the calculated distance; the determination of whether the topics of each of the multiple rounds of dialogue meet the preset requirements includes: counting the topics that appear repeatedly in the topics of the multiple rounds of dialogue in each dialogue, obtaining the number of times each topic appears repeatedly in the dialogue, and determining topics whose number of appearances does not reach the preset number of appearances as topics that do not meet the preset requirements.