Semantic effect evaluation method and related apparatus

By constructing a multi-turn dialogue test set and a structured model, the problem of the inability to evaluate semantic recognition error nodes was solved, achieving the integrity of the dialogue process and accurate evaluation of semantic effects, and reducing the dependence and cost of online testing.

CN114492461BActive Publication Date: 2026-06-26IFLYTEK SOUTH CHINA ARTIFICIAL INTELLIGENCE RES INST GUANGZHOU CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
IFLYTEK SOUTH CHINA ARTIFICIAL INTELLIGENCE RES INST GUANGZHOU CO LTD
Filing Date
2021-12-24
Publication Date
2026-06-26

Smart Images

  • Figure CN114492461B_ABST
    Figure CN114492461B_ABST
Patent Text Reader

Abstract

The application discloses a semantic effect evaluation method and related device, the semantic effect evaluation method comprises: obtaining a to-be-evaluated dialogue; wherein the to-be-evaluated dialogue comprises predicted text related to a user intention; inputting the to-be-evaluated dialogue into a multi-turn dialogue test set, verifying the predicted text of different nodes in the to-be-evaluated dialogue by using the multi-turn dialogue test set; in response to at least one error node with an identification error existing in the to-be-evaluated dialogue, reconstructing content after the error node in the to-be-evaluated dialogue based on the multi-turn dialogue test set to obtain a first dialogue; and evaluating the first dialogue based on all nodes in the first dialogue. In this way, the semantic effect can be verified in an offline manner, and when a node with a semantic identification error is encountered in the test process, the remaining part after the error node is reconstructed in real time, so that the next sentence of dialogue can continue to flow into the next node for evaluation, and finally the evaluation of the semantic effect of all nodes is completely realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of semantic understanding technology, specifically relating to a semantic effect evaluation method and related apparatus. Background Technology

[0002] In recent years, with the significant advancements in natural language understanding achieved by artificial intelligence, dialogue systems have been increasingly widely applied in various industrial scenarios, such as "voice assistants," "smart speakers," and "intelligent outbound call robots." Dialogue systems can be broadly categorized by purpose into task-oriented, question-and-answer, and casual conversation types, and by the number of interaction rounds into single-turn and multi-turn dialogues. Multi-turn dialogues, through multiple rounds of information interaction with the user, can obtain more accurate user interaction information, thereby providing users with more diverse, better experiences, and more complex needs. Since multi-turn dialogues require multiple rounds of interaction with the user to ultimately achieve the desired outcome, the transitions between dialogue nodes constitute the dialogue flow. The complexity of the dialogue flow is related to the specific number of dialogue rounds and the business scenario; the more rounds and the more information needs to be exchanged, the more complex the dialogue flow tends to be designed.

[0003] The dialogue flow is also divided into different dialogue layers based on the depth of the dialogue content, such as "self-introduction layer," "information confirmation layer," "product introduction layer," and "intent acquisition layer." Different dialogue layers contain different intent nodes, so the scope of user semantic recognition varies in each round of interaction. In particular, some intents do not have absolute boundaries, and there are issues such as semantic entanglement, often requiring identification of entangled intents based on intent priority. Furthermore, in complex dialogue scenarios, semantic relevance and information inheritance exist between different rounds of dialogue, further complicating semantic recognition. Additionally, optimizing semantic understanding or dialogue flow design requires A / B testing for each optimization. However, online business volume may not be sufficient to meet the need for A / B testing of every optimization point, or online A / B testing may be costly, and if a node with semantic recognition errors is encountered during testing, the process cannot proceed to the next node for evaluation.

[0004] Therefore, a new method for evaluating semantic effects is urgently needed to solve the above problems. Summary of the Invention

[0005] The main technical problem addressed by this application is to provide a semantic effect evaluation method and related apparatus to solve the problem that when a node with semantic recognition error is encountered during the testing process, the evaluation cannot continue to the next node.

[0006] To address the aforementioned technical problems, this application provides a semantic effect evaluation method, comprising: obtaining a dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; inputting the dialogue to be evaluated into a multi-turn dialogue test set, and using the multi-turn dialogue test set to verify the predicted text of different nodes in the dialogue to be evaluated; in response to the presence of at least one incorrectly identified node in the dialogue to be evaluated, reconstructing the content after the incorrect node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain a first dialogue; and evaluating the first dialogue based on all nodes in the first dialogue.

[0007] The multi-turn dialogue test set is constructed based on a first text in the current dialogue and a second text in the historical dialogue, where the first text and the second text are the user's voice text in the dialogue. The construction process of the multi-turn dialogue test set includes: obtaining first structured information corresponding to the first text; performing similarity matching between the first structured information and the second structured information corresponding to each second text; and combining the current dialogue and the historical dialogue corresponding to the second structured information to construct the multi-turn dialogue test set in response to the similarity between the first structured information and the second structured information being greater than a preset threshold.

[0008] Wherein, the first structured information includes a first intent tag and a first tag tree corresponding to the first text, and the second structured information includes a second intent tag and a second tag tree corresponding to the second text; the step of performing similarity matching between the first structured information and the second structured information corresponding to each second text includes: based on the second intent tag in the second structured information, obtaining third structured information from all the second structured information where the second intent tag is consistent with the first intent tag; and performing similarity matching between the first tag tree in the first structured information and the second tag tree in the third structured information.

[0009] The step of obtaining the first structured information corresponding to the first text includes: inputting the first text into a structured model and obtaining a first tag tree corresponding to the first text based on the structured model; obtaining a first intent tag corresponding to each of the first texts; and combining the first tag tree and the first intent tag to obtain the first structured information.

[0010] The step of obtaining the first label tree corresponding to the first text based on the structured model includes: using BERT encoding and CRF decoding to predict and obtain interval-type label and its start and end point positions in the first text, and using an attention mechanism to interactively obtain the main intent node corresponding to the first text; and constructing the first label tree corresponding to the first text based on the interval-type label and its start and end point positions and the main intent node.

[0011] The step of obtaining the first intent label corresponding to each first text includes: performing BERT encoding on the first text to obtain a first feature vector; performing GCN encoding on the first tag tree to obtain a second feature vector; performing an attention mechanism interaction between the first feature vector and the second feature vector to obtain a third feature vector; and obtaining the first intent label corresponding to the first text based on the third feature vector.

[0012] The step of reconstructing the content after the erroneous node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain the first dialogue includes: for the dialogue to be evaluated, constructing at least one correct node after the erroneous node based on the multi-turn dialogue test set and the tag and intent tag corresponding to the erroneous node to construct the first dialogue.

[0013] The step of evaluating the first dialogue based on all nodes in the first dialogue includes: evaluating the first dialogue based on the completion rate of all nodes in the first dialogue.

[0014] To address the aforementioned technical problems, another technical solution adopted in this application is: providing a semantic effect evaluation device, comprising: an acquisition module for acquiring a dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; a verification module, coupled to the acquisition module, for inputting the dialogue to be evaluated into a multi-turn dialogue test set, and using the multi-turn dialogue test set to verify the predicted text of users at different nodes in the dialogue to be evaluated; a reconstruction module, coupled to the verification module, for reconstructing the content after the erroneous node in the dialogue to be evaluated based on the multi-turn dialogue test set in response to the existence of at least one incorrectly identified node in the dialogue to be evaluated, to obtain a first dialogue; and an evaluation module, coupled to the reconstruction module, for evaluating the first dialogue based on all nodes in the first dialogue.

[0015] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide an electronic device, including a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is used to execute the program instructions to implement the semantic effect evaluation method mentioned in any of the above embodiments.

[0016] To address the aforementioned technical problems, another technical solution adopted in this application is to provide a computer-readable storage medium storing a computer program for implementing the semantic effect evaluation method mentioned in any of the above embodiments.

[0017] Unlike existing technologies, the beneficial effects of this application are as follows: The semantic effect evaluation method provided by this application includes: obtaining a dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; inputting the dialogue to be evaluated into a multi-turn dialogue test set, and using the multi-turn dialogue test set to verify the predicted text of different nodes in the dialogue to be evaluated; then, when there is at least one incorrectly identified node in the dialogue to be evaluated, reconstructing the content after the incorrect node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain a first dialogue; finally, evaluating the first dialogue based on all nodes in the first dialogue. Through this design, the verification method combines semantics with process nodes to solve the problem of entanglement of corresponding intents in different nodes, and uses offline data to construct a test set, so that the semantic effect can be verified offline. Furthermore, when a node with semantic recognition error is encountered during the test, the remaining part after the incorrect node is reconstructed in real time, so that it can continue to flow to the next node to evaluate the next sentence of dialogue, and finally fully realize the evaluation of the semantic effect of all nodes. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort, wherein:

[0019] Figure 1 This is an example diagram showing the state transitions between different dialogue nodes in a multi-turn dialogue.

[0020] Figure 2 This is a flowchart illustrating one implementation method of the semantic effect evaluation method of this application;

[0021] Figure 3 This is a schematic diagram of the multi-turn dialogue test set data;

[0022] Figure 4This is a schematic diagram of the Utterance-Token structure of multi-turn dialogue text;

[0023] Figure 5 This is a flowchart illustrating the process of constructing a multi-turn dialogue test set;

[0024] Figure 6 yes Figure 5 A flowchart illustrating an implementation method for step S10;

[0025] Figure 7 yes Figure 6 A flowchart of one embodiment corresponding to step S20;

[0026] Figure 8 This is a schematic diagram of a structured model;

[0027] Figure 9 yes Figure 6 A flowchart illustrating an implementation method for step S21;

[0028] Figure 10 yes Figure 5 A flowchart illustrating an implementation method for step S11;

[0029] Figure 11 This is an example of building a multi-turn dialogue semantic test set data in real time;

[0030] Figure 12 This is a schematic diagram of the framework of one embodiment of the semantic effect evaluation device of this application;

[0031] Figure 13 This is a schematic diagram of the framework of one embodiment of the electronic device of this application;

[0032] Figure 14 This is a schematic diagram of a framework of one embodiment of the computer-readable storage medium of this application. Detailed Implementation

[0033] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0034] The current methods for analyzing the semantic effects of multi-turn dialogue generally include the following: (1) Encoding the emotional features together with the current sentence information of the user. Compared with single sentence information, this method has more emotional supervision signals and has a more obvious improvement in the intention related to emotions; (2) Using the current user sentence information and the previous dialogue history information as input, the model semantic training is carried out by adopting a multi-task training objective. This method can achieve the response accuracy of cross-language multi-turn human-computer dialogue in a low-resource corpus environment; (3) According to the test data, the user steps or robot steps in the dialogue process are replaced with corresponding statements to generate test cases. Finally, the dialogue robot is tested based on the test cases, and the test results are obtained and displayed. However, these methods have the following drawbacks: (1) The analysis of the semantic effect of multi-turn dialogue does not combine with the flowchart, which cannot solve the problem that there may be semantic entanglement in the process at different levels. Moreover, the semantic distribution of tests at different levels is often different. Semantic recognition for all semantic spaces alone significantly expands the recognition space. This method not only increases the recognition difficulty, but also increases the entanglement between semantics; (2) The existing offline data is not used to build a test set. The A / B test before and after optimization is carried out only by online data; (3) It is impossible to accurately evaluate the impact of process optimization or semantic optimization on the completion rate of dialogue tasks.

[0035] The semantic effectiveness evaluation method provided in this application is a test of existing semantic understanding systems, involving the construction of a multi-turn dialogue test set and a new process testing method. Please refer to [link / reference]. Figure 1 , Figure 1 This is an example diagram illustrating the state transitions at different dialogue nodes in a multi-turn dialogue. In constructing the multi-turn dialogue test set, semantic understanding is performed based on existing multi-turn dialogue data, followed by manual verification. Correct dialogue data can be directly used as test data. For erroneous dialogue data, correct test data is constructed using the aforementioned method for constructing the multi-turn dialogue test set. Each test set includes multiple turns of dialogue, and each test set carries the flow information it is testing. In the flow testing method, the multi-turn dialogues in the test data are sequentially input into the corresponding flow nodes in the semantic understanding system for semantic understanding. Only correct dialogues continue to the next node for understanding the next sentence. The semantic effect evaluation method provided in this application will be introduced below.

[0036] Please see Figure 2 , Figure 2 This is a flowchart illustrating one implementation method of the semantic effect evaluation method of this application. The aforementioned semantic effect evaluation method specifically includes:

[0037] S1: Obtain the dialogue to be evaluated.

[0038] Specifically, the dialogue to be evaluated includes predicted text related to the user's intent. In this embodiment, the dialogue system predicts the user's intent text based on process nodes and evaluates this intent text to obtain semantic understanding recognition performance.

[0039] S2: Input the dialogue to be evaluated into a multi-turn dialogue test set, and use the multi-turn dialogue test set to verify the predicted text of different nodes in the dialogue to be evaluated.

[0040] Specifically, validating the semantic effectiveness of a multi-turn dialogue system first requires constructing a multi-turn dialogue test set. This test set includes all possible dialogue transition paths in the entire dialogue flow. Each sample data point contains a complete flow path, all nodes in that path, the user's query at the current stage, and the corresponding correctly labeled intent. Specifically, the sample data in this multi-turn dialogue test set resembles real dialogue, but it does not contain any dialogue sample data with semantic recognition errors. However, since machine semantic recognition in real-world scenarios cannot achieve a 100% accuracy rate, there will be some recognition errors in the intent nodes within the flow. Therefore, it is necessary to filter and reorganize the offline-saved sample data to ultimately construct the multi-turn dialogue test set required in this application, i.e., multi-turn dialogue sample data under a scenario with no semantic recognition errors. In this embodiment, based on different business scenarios, there are mainly two types of approaches: one is a simple scenario, which mainly involves single-turn dialogues or multi-turn dialogues where there is no semantic correlation between each turn; the other is a complex scenario, where there is semantic inheritance or correlation between each turn of dialogue.

[0041] Specifically, in this embodiment, the multi-turn dialogue test set is constructed based on the first text in the current dialogue and the second text in the historical dialogue. The first and second texts are the voice texts of users in the dialogue, and the current dialogue may include at least one first text. Furthermore, the users in the current dialogue and the historical dialogue can be different users or the same user. Please refer to the following in this embodiment: Figures 3-5 , Figure 3 This is a diagram illustrating the test set data for multi-turn dialogues. Figure 4 This is a schematic diagram of the Utterance-Token structure of multi-turn dialogue text. Figure 5 This is a flowchart illustrating the process of building a multi-turn dialogue test set. Specifically, the process of building a multi-turn dialogue test set includes:

[0042] S10: Obtain the first structured information corresponding to the first text.

[0043] Specifically, the user's text (Utterance) mentioned above is structured to obtain its corresponding structured information. This structured information refers to the tree structure obtained after extracting tokens from the unstructured text and constructing a token tree. Additionally, the structured information also includes intent tags obtained through intent recognition of the current unstructured text. In general, the intent tags and token tree corresponding to the unstructured text are collectively referred to as the structured information of the text. For example... Figure 4 As shown, token tags corresponding to each round of user Utterance are extracted, and the extracted token tags are combined according to a predefined general semantic structure tree to obtain a token tag tree. In addition, the intent tags corresponding to each round of user Utterance are obtained through semantic judgment.

[0044] Specifically, in this embodiment, please refer to Figure 6 , Figure 6 yes Figure 5 A flowchart illustrating one embodiment of step S10. Specifically, step S10 includes:

[0045] S20: Input the first text into the structured model and obtain the first tag tree corresponding to the first text based on the structured model.

[0046] Specifically, in this embodiment, such as Figure 4 As shown, if the content of the user's first text in the multi-turn dialogue is "Um, what packages are available now?", then the token label corresponding to the first text is "Question" + "Existing Packages" → "Activity Content"; if the content of the user's first text in the multi-turn dialogue is "Hey, I'd like to ask if this package can be changed?", then the token label corresponding to the first text is "Inquiry" + "Marketing Package" → "Change"; if the content of the user's first text in the multi-turn dialogue is "So, 19 + 19, a total of 38 yuan?", then the token label corresponding to the first text is "Question" + "Marketing Package" → "Specific Amount (19)", "Specific Amount (19)", "Specific Amount (38)" and "Operator - Addition". Specifically, each first text in the current dialogue is input into the structured model, and the first tag tree corresponding to the first text (i.e., Figure 4 Utterance-Token is a structured token system.

[0047] Furthermore, in this embodiment, human experts define the user's intention type for rejecting marketing in specific marketing task scenarios (L = [l1, l2, ..., ln]). For example, in the 4G to 5G upgrade service of China Telecom, the user rejection type tags include "busy," "with Wi-Fi," "sufficient data," "able to receive speed reduction," "forgot to cancel," and "unclear intent," etc. Then, an intent recognition scheme is used to identify the intent of the current user content. However, in practice, it has been found that there are many difficult problems in identifying user intent, such as: multiple types of intent, negation intent, accounting intent, consultation intent, etc. The mainstream semantic representation method based on BERT encoding cannot capture the above-mentioned difficult intents well. In this regard, this application proposes an intent recognition scheme based on semantic structure parsing. This scheme mainly constructs the user's current expression content in a structured manner based on semantic token tags, then uses a graph decoding scheme to construct a semantic tag graph, and finally realizes the identification of user intent tags through matching metrics.

[0048] Specifically, please refer to the following: Figures 7-8 , Figure 7 yes Figure 6 A flowchart of one embodiment corresponding to step S20 is shown. Figure 8 This is a schematic diagram of the structured model. Specifically, the structured model includes an image analysis program, a BERT encoder, a CRF decoder and tag extractor, and a tag extractor. Step S20, which involves obtaining the first tag tree corresponding to the first text based on the structured model, includes:

[0049] S200: BERT encoding and CRF decoding are used to predict and obtain the interval class labels and their start and end points in the first text, and the main intent node corresponding to the first text is obtained through an attention mechanism.

[0050] Specifically, in this embodiment, such as Figure 8 As shown, if the input, the content of the current user's first text, is "Can I change my bank card inquiry password?", it is input into the structured model. In the structured model, the BERT encoder and CRF tagger sequentially predict the interval class labels and their start and end positions in the first text. For example, for this first text, its interval class labels could be... Figure 8In the Node Memory W, the terms "Savings Card," "Password Inquiry," "Modify," and "Process" are assigned a starting position of "Savings Card" and an ending position of "Process." Simultaneously, the intent of the first text is "Modify" and "Process." Through an attention mechanism, the primary intent node corresponding to the first text is predicted, i.e., the tag-type token (here, the primary intent node of the first text is "Modify"). This tag-type token is the primary intent node corresponding to the first text. Furthermore, in this embodiment, the extracted interval-type labels are also referred to as intent nodes (concepts) in the constructed graph.

[0051] S201: Construct the first tag tree corresponding to the first text based on the interval-type tag labels, their start and end point positions, and the main intent nodes.

[0052] Specifically, in this embodiment, a label tree is constructed based on the results of the multiple intent nodes identified in step S200 above. Specifically, if there is only one primary intent node among the multiple intent nodes, only one label tree needs to be constructed; if there are multiple primary intent nodes among the multiple intent nodes, multiple label trees need to be constructed. In this embodiment, A) the first label tree is initialized as the primary intent node, that is, the primary intent node is represented as the initial graph state. A node t is selected from the extracted interval-type label (i.e., intent node), and the selection order can be from the starting position to the ending position; B) in the constructed partial graph, a parent node is selected for the node t, wherein information from the constructed partial subtree can be incorporated when selecting the parent node; C) the graph and graph state are updated; D) steps A and C are repeated until an END node is selected, thus ending the construction process. Figure 4 In each round of user comments, the input content for the model is used. The first tag tree is a predefined tree structure composed of nodes. Parent nodes are nodes arranged layer by layer from the root node to the leaf nodes, and a higher-level node can become the parent node of a connected lower-level node. The update method in step C is the entire construction process of the first tag tree, which proceeds layer by layer downwards from the root node. Figure 8 As shown, from G t →g(˙)→GraphState→f(·) are the root nodes of the next level in sequence. For example... Figure 4 and 8As shown, the Initial Graph state is the initial state during the construction of the first label tree in the structured model, and it is the root virtual node. Furthermore, g(˙) is the head selection module, and f(˙) is the node selection module. In this embodiment, the above steps can be executed in the graph analysis program within the structured model. The current graph is saved to the graph memory by the graph encoder, and the first label tree is constructed by continuously updating the Graph State.

[0053] S21: Obtain the first intent label corresponding to each first text.

[0054] Specifically, the user's first text is input in each round of dialogue, and matched with natural sentence patterns under each intent in the resource library. The first text is then fed into the first tag tree constructed in step S20. Following the AD method described above, the first tag tree corresponding to each natural sentence pattern under each intent in the resource library is constructed offline. Specifically, in this embodiment, multiple natural sentence patterns under intents and their corresponding first tag trees are pre-constructed in the resource library. Furthermore, by using a graph transformer and a BERT encoder to encode the first tag tree and the first text respectively, their respective representations can be obtained. Then, these representations are interacted through an attention mechanism to finally obtain the final representation of the first text. Specifically, in this embodiment, please refer to... Figure 9 , Figure 9 yes Figure 6 A flowchart illustrating one embodiment of step S21. Specifically, step S21 includes:

[0055] S210: Encode the first text using BERT to obtain the first feature vector, and encode the first tag tree using GCN to obtain the second feature vector.

[0056] Specifically, the first text and the corresponding first tag tree are encoded using BERT and GCN respectively to obtain their respective representations, namely the first feature vector m and the second feature vector n.

[0057] S211: Interact the first feature vector and the second feature vector through an attention mechanism to obtain the third feature vector.

[0058] Specifically, the first feature vector m and the second feature vector n obtained in step S210 are interacted through an attention mechanism to obtain the third feature vector o.

[0059] S212: Obtain the first intent label corresponding to the first text based on the third feature vector.

[0060] Specifically, based on the third feature vector o obtained in step S211 above, the first intent label corresponding to the first text is obtained from the resource library. This can be achieved through CRF decoding, etc., and is not limited here. For example, such as Figure 4 As shown, the first intent tag for the first text "Um, what packages are available now?" is "What promotion?"; the first intent tag for the first text "Hey, let me ask if this package can be changed?" is "Change package", etc.

[0061] Specifically, in this embodiment, to construct a multi-turn dialogue test set offline, a tag tree corresponding to each natural sentence structure under each intent in the matching resources of the knowledge base is constructed offline in the same manner. Then, a Graph Transformer is used to perform GCN encoding on the tag tree corresponding to user content in historical dialogues, and the encoded sentence representations are saved offline to the multi-turn dialogue test set. This allows intent tags for any user content to be obtained through matching queries. Query matching is a method of intent recognition; however, the intent recognition involved here does not employ classification but rather uses matching to identify the intent of the input user content.

[0062] S22: Combine the first tag tree and the first intent tag to obtain the first structured information.

[0063] Specifically, after steps S20 and S21, after obtaining the first tag tree and the first intent tag corresponding to the first text, the two are combined into a structure as follows: Figure 4 The Utterance-Token is structured within this context. At this point, the first structured information corresponding to the first text is obtained.

[0064] S11: Perform similarity matching between the first structured information and the second structured information corresponding to each second text.

[0065] Specifically, the first structured information includes a first intent label and a first tag tree corresponding to the first text, and the second structured information includes a second intent label and a second tag tree corresponding to the second text. Specifically, in this embodiment, please refer to... Figure 10 , Figure 10 yes Figure 5 A flowchart illustrating an implementation of step S11. Specifically, step S11 includes:

[0066] S110: Based on the second intent label in the second structured information, obtain third structured information from all the second structured information where the second intent label is consistent with the first intent label.

[0067] Specifically, as described above, the identification of intent labels here involves performing GCN encoding on the labeled tree, interacting with the encoded representation of the text through an attention mechanism to output a feature vector, and then obtaining the current intent label according to the retrieval sorting method. In this embodiment, third structured information whose corresponding second intent label is consistent with the first intent label is obtained from all second structured information.

[0068] S111: Perform similarity matching between the first tag tree in the first structured information and the second tag tree in the third structured information.

[0069] Specifically, the similarity matching in this step is done through a pipeline, first determining whether the intent tags are consistent, and then determining whether the nodes in the tag tree are consistent.

[0070] S12: In response to the similarity between the first structured information and the second structured information being greater than a preset threshold, the current dialogue and the historical dialogues corresponding to the second structured information are combined to construct a multi-turn dialogue test set.

[0071] Specifically, when the structured information (intent labels + nodes in the tag tree) of two dialogues is similar, they can be considered similar dialogue data and combined to construct a multi-turn dialogue test set. This involves combining the first text corresponding to the first structured information with the second text corresponding to the third structured information. In other words, when the input is the first text, the structured information corresponding to the second text can be obtained, allowing entry into the corresponding flow node. Practice has shown that the test set data combined using this method conforms to the fluency and rationality of real dialogue data. Since this construction method requires the use of the preceding information, the basic unit of construction relies on the combination of multiple interconnected dialogue segments. Theoretically, the more dialogue turns a correct dialogue segment contains, the better the coherence and relevance of the content of the complete dialogue constructed from multiple dialogues. The constructed complete dialogue also includes a certain dialogue flow path in the dialogue flow diagram. All dialogue flow paths can be combined through different combination methods.

[0072] This design approach proposes a multi-turn dialogue test set for complex scenarios, which filters combinations of different segments based on user tag matching. This makes the construction of a complete dialogue from multiple dialogue segments closer to real dialogue data in terms of content inheritance and fluency. Moreover, this data is used to evaluate the current semantic effect and the results of the dialogue flow on the user's dialogue state transitions, which can avoid the reliance on online scenarios for A / B testing during the solution optimization process.

[0073] S3: Determine if there are any erroneous nodes in the dialogue to be evaluated that are incorrectly identified.

[0074] Specifically, after constructing a complete multi-turn dialogue test set, the semantic recognition performance can be verified offline. For example... Figure 3 As shown, the input is a multi-turn dialogue data. The number of turns is cyclical from the initial node to "current user content intent recognition" to "intent jump according to recognition result" until a certain node finds an error in the "current user content intent recognition" result, which means that the semantic recognition has an error in the intent recognition corresponding to the node.

[0075] S4: If so, then reconstruct the content after the erroneous node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain the first dialogue.

[0076] Specifically, in this embodiment, please refer to Figure 11 , Figure 11 This is an example of real-time construction of a multi-turn dialogue semantic test set. Even when encountering semantic recognition errors, the entire dialogue process must be completed (including user hang-ups). This aligns with real-world online scenarios where, even if machine semantic recognition errors occur during human-computer dialogue, the conversation can continue until the dialogue process is finished. Specifically, if there is at least one erroneous node in the dialogue to be evaluated, the content after the erroneous node in the dialogue to be evaluated is reconstructed based on the multi-turn dialogue test set to obtain the first dialogue. For example... Figure 11 As shown, during multiple rounds of offline testing in Session A, a semantic recognition error is encountered. At this point, the dialogue and intent of the current error are retained, and the dialogue content of the next part is "constructed" according to the current erroneous intent in order to reconstruct a new first dialogue.

[0077] Specifically, step S4 includes: for the dialogue to be evaluated, constructing at least one correct node after the erroneous node based on the multi-turn dialogue test set and the tag and intent tag corresponding to the erroneous node to construct the first dialogue. Specifically, for the dialogue to be evaluated, if the process jumps incorrectly at Node3 due to an Internet32 intent recognition error, Node5 is constructed after Internet32 at the erroneous node Node3 according to the tag and intent tag corresponding to the erroneous node Node3, based on the multi-turn dialogue test set, so that the dialogue can continue even when the machine semantic recognition is incorrect, until the dialogue process is completed. Of course, multiple correct nodes can also be constructed after the erroneous node; this application does not limit this.

[0078] S5: Evaluate the first dialogue based on all nodes in the first dialogue.

[0079] Optionally, in this embodiment, step S5 specifically includes: evaluating the first dialogue based on the completion rate of all nodes in the first dialogue. Through this design, the scheme for real-time reconstruction of the dialogue to be evaluated can fully realize the evaluation of the semantic effects of all nodes, and by statistically analyzing the sample completion rate of all exit nodes in the semantic dialogue task, it can serve as an indicator for evaluating the effectiveness of process optimization.

[0080] S6: Otherwise, evaluate the dialogue to be evaluated based on all nodes in the dialogue to be evaluated.

[0081] Specifically, if there are no incorrectly identified nodes in the dialogue to be evaluated, it indicates that all process nodes in the dialogue are correct. Therefore, the dialogue to be evaluated is then evaluated based on all nodes in the dialogue. For example... Figure 1 As shown, Internet31 intent was correctly identified at node Node3, allowing direct jump to the next node, Node4. The evaluation method is the same as in step S5 and will not be repeated here. The sample completion rate of all exit nodes in the semantic dialogue task can be used as an indicator to evaluate the effectiveness of process optimization.

[0082] In general, this application combines semantics with process nodes to solve the problem of entangled intentions in different nodes, and uses offline data to build a multi-turn dialogue test set, and uses an offline method to verify the semantic effect. Furthermore, when encountering nodes with semantic recognition errors during the test, the remaining part of the current dialogue data is reconstructed in real time.

[0083] Please see Figure 12 , Figure 12 This is a schematic diagram of a framework for one embodiment of the semantic effect evaluation device of this application. The semantic effect evaluation device specifically includes:

[0084] The first acquisition module 11 is used to acquire the dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to the user's intent.

[0085] The verification module 12, coupled to the first acquisition module 11, is used to input the dialogue to be evaluated into a multi-turn dialogue test set and use the multi-turn dialogue test set to verify the predicted text of different node users in the dialogue to be evaluated.

[0086] The reconstruction module 13, coupled to the verification module 12, is used to reconstruct the content after the erroneous node in the dialogue to be evaluated based on a multi-turn dialogue test set in response to the presence of at least one erroneous node with an incorrect identification in the dialogue to be evaluated, so as to obtain the first dialogue.

[0087] Evaluation module 14, coupled to reconstruction module 13, is used to evaluate the first dialogue based on all nodes in the first dialogue.

[0088] In one embodiment, the semantic effect evaluation device provided in this application further includes a first construction module 10, which is coupled to a first acquisition module 11. The multi-turn dialogue test set is constructed based on the first text in the current dialogue and the second text in the historical dialogue. The first text and the second text are the user's voice text in the dialogue. The first construction module 10 is used to construct the multi-turn dialogue test set. Specifically, in this embodiment, the first construction module 10 further includes a second acquisition module, a matching module, and a processing module. The second acquisition module is coupled to the matching module and is used to acquire the first structured information corresponding to the first text. The matching module is coupled to the processing module and is used to perform similarity matching between the first structured information and the second structured information corresponding to each second text. The processing module is used to combine the current dialogue and the historical dialogue corresponding to the second structured information in response to the similarity between the first structured information and the second structured information being greater than a preset threshold, so as to construct the multi-turn dialogue test set.

[0089] Optionally, in this embodiment, the first structured information includes a first intent label and a first tag tree corresponding to the first text, and the second structured information includes a second intent label and a second tag tree corresponding to the second text. Furthermore, the matching module includes a third obtaining module and a similarity module. The third obtaining module is coupled to the similarity module, wherein the third obtaining module is used to obtain third structured information from all second structured information where the second intent label is consistent with the first intent label, based on the second intent label in the second structured information; the similarity module is used to perform similarity matching between the first tag tree in the first structured information and the second tag tree in the third structured information.

[0090] Optionally, in this embodiment, the second obtaining module includes a tag tree module, an intent tag module, and a combination module. The tag tree module is coupled to the intent tag module, and the intent tag module is coupled to the combination module. Specifically, the tag tree module is used to input the first text into the structured model and obtain the first tag tree corresponding to the first text based on the structured model; the intent tag module is used to obtain the first intent tag corresponding to each first text; and the combination module is used to combine the first tag tree and the first intent tag to obtain the first structured information.

[0091] Optionally, in this embodiment, the tag tree module includes a node module and a second construction module that are coupled to each other. The node module is used to predict and obtain the interval-type tag labels and their start and end point positions in the first text using BERT encoding and CRF decoding, and to obtain the main intent node corresponding to the first text through an attention mechanism. The second construction module is used to construct the first tag tree corresponding to the first text based on the interval-type tag labels and their start and end point positions and the main intent node.

[0092] Specifically, in this embodiment, the intent label module includes a feature module and a fourth acquisition module that are coupled to each other. The feature module is used to encode the first text using BERT to obtain a first feature vector and to encode the first label tree using GCN to obtain a second feature vector. The feature module is also used to interact the first feature vector and the second feature vector using an attention mechanism to obtain a third feature vector. The fourth acquisition module is used to obtain the first intent label corresponding to the first text based on the third feature vector.

[0093] Optionally, in this embodiment, the reconstruction module 13 includes a third construction module, which is used to construct at least one correct node after the error node to construct the first dialogue based on the multi-turn dialogue test set and the tag labels and intent labels corresponding to the error nodes.

[0094] Optionally, in this embodiment, the evaluation module 14 includes a completion rate module, used to evaluate the first dialogue based on the completion rate of all nodes in the first dialogue.

[0095] Please see Figure 13 , Figure 13 This is a schematic diagram of a framework of one embodiment of the electronic device of this application. The electronic device includes a memory 20 and a processor 22 coupled to each other. Specifically, in this embodiment, the memory 20 stores program instructions, and the processor 22 is used to execute the program instructions to implement the semantic effect evaluation method mentioned in any of the above embodiments.

[0096] Specifically, processor 22 can also be referred to as a CPU (Central Processing Unit). Processor 22 may be an integrated circuit chip with signal processing capabilities. Processor 22 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor. Furthermore, processor 22 can be implemented using multiple integrated circuit chips.

[0097] Please see Figure 14 , Figure 14This is a schematic diagram illustrating one embodiment of the computer-readable storage medium of this application. The computer-readable storage medium 30 stores a computer program 300, which can be read by a computer and executed by a processor to implement the semantic effect evaluation method mentioned in any of the above embodiments. The computer program 300 can be stored in the computer-readable storage medium 30 in the form of a software product, including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The computer-readable storage medium 30 with storage function can be any medium capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or a terminal device such as a computer, server, mobile phone, or tablet.

[0098] In summary, unlike existing technologies, the semantic effect evaluation method provided in this application includes: obtaining a dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; inputting the dialogue to be evaluated into a multi-turn dialogue test set, and using the multi-turn dialogue test set to verify the predicted text of different nodes in the dialogue to be evaluated; then, when there is at least one incorrectly identified node in the dialogue to be evaluated, reconstructing the content after the incorrect node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain a first dialogue; finally, evaluating the first dialogue based on all nodes in the first dialogue. Through this design, the verification method combines semantics with process nodes to solve the problem of entanglement of corresponding intents in different nodes, and utilizes offline data to construct a test set, thus enabling offline verification of semantic effects. Furthermore, when encountering a node with semantic recognition errors during testing, the remaining part after the incorrect node is reconstructed in real time, allowing it to continue flowing to the next node to evaluate the next sentence of dialogue, ultimately achieving a complete evaluation of the semantic effects of all nodes.

[0099] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. A method for evaluating semantic effectiveness, characterized in that, include: Obtain the dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; The dialogue to be evaluated is input into a multi-turn dialogue test set, and the predicted text of different nodes in the dialogue to be evaluated is verified using the multi-turn dialogue test set. In response to the presence of at least one incorrectly identified node in the dialogue to be evaluated, the content following the incorrect node in the dialogue to be evaluated is reconstructed based on the multi-turn dialogue test set to obtain a first dialogue. The first dialogue is evaluated based on all nodes in the first dialogue; The multi-turn dialogue test set is constructed based on the first text in the current dialogue and the second text in the historical dialogue, where the first text and the second text are the user's voice text in the dialogue; the construction process of the multi-turn dialogue test set includes: Obtain the first structured information corresponding to the first text, wherein the first structured information includes the first tag tree and the first intent tag corresponding to the first text; The first structured information is matched with the second structured information corresponding to each second text for similarity. The second structured information includes the second intent label and the second tag tree corresponding to the second text. In response to the fact that the similarity between the first structured information and the second structured information is greater than a preset threshold, the current dialogue and the historical dialogue corresponding to the second structured information are combined to construct the multi-turn dialogue test set.

2. The semantic effect evaluation method according to claim 1, characterized in that, The step of performing similarity matching between the first structured information and the second structured information corresponding to each second text includes: Based on the second intent tag in the second structured information, obtain third structured information from all the second structured information where the second intent tag is consistent with the first intent tag; The first tag tree in the first structured information is matched with the second tag tree in the third structured information based on similarity.

3. The semantic effect evaluation method according to claim 1, characterized in that, The step of obtaining the first structured information corresponding to the first text includes: The first text is input into a structured model, and a first tag tree corresponding to the first text is obtained based on the structured model; Obtain the first intent tag corresponding to each of the first texts; The first tag tree and the first intent tag are combined to obtain the first structured information.

4. The semantic effect evaluation method according to claim 3, characterized in that, The step of obtaining the first tag tree corresponding to the first text based on the structured model includes: BERT encoding and CRF decoding are used to predict the interval class labels and their start and end points in the first text, and the main intent node corresponding to the first text is obtained through an attention mechanism. The first tag tree corresponding to the first text is constructed based on the interval-type tag labels, their start and end point positions, and the main intent node.

5. The semantic effect evaluation method according to claim 3, characterized in that, The step of obtaining the first intent tag corresponding to each of the first texts includes: The first text is BERT encoded to obtain a first feature vector, and the first tag tree is GCN encoded to obtain a second feature vector; The first feature vector and the second feature vector are interacted through an attention mechanism to obtain a third feature vector; The first intent label corresponding to the first text is obtained based on the third feature vector.

6. The semantic effect evaluation method according to claim 1, characterized in that, The step of reconstructing the content following the erroneous node in the dialogue to be evaluated based on the multi-turn dialogue test set to obtain the first dialogue includes: For the dialogue to be evaluated, at least one correct node is constructed after the error node based on the multi-turn dialogue test set and the tag and intent tag corresponding to the error node to construct the first dialogue.

7. The semantic effect evaluation method according to claim 1, characterized in that, The step of evaluating the first dialogue based on all nodes in the first dialogue includes: The first dialogue is evaluated based on the completion rate of all nodes in the first dialogue.

8. A semantic effect evaluation device, characterized in that, include: An acquisition module is used to acquire the dialogue to be evaluated; wherein the dialogue to be evaluated includes predicted text related to user intent; The verification module, coupled to the acquisition module, is used to input the dialogue to be evaluated into a multi-turn dialogue test set and use the multi-turn dialogue test set to verify the predicted text of users at different nodes in the dialogue to be evaluated. A reconstruction module, coupled to the verification module, is used to reconstruct the content after the erroneous node in the dialogue to be evaluated based on the multi-turn dialogue test set in response to the presence of at least one erroneous node with an incorrect identification in the dialogue to be evaluated, in order to obtain a first dialogue. An evaluation module, coupled to the reconstruction module, is used to evaluate the first dialogue based on all nodes in the first dialogue; The multi-turn dialogue test set is constructed based on the first text in the current dialogue and the second text in the historical dialogue, where the first text and the second text are the user's voice text in the dialogue; the construction process of the multi-turn dialogue test set includes: Obtain the first structured information corresponding to the first text, wherein the first structured information includes the first tag tree and the first intent tag corresponding to the first text; The first structured information is matched with the second structured information corresponding to each second text for similarity. The second structured information includes the second intent label and the second tag tree corresponding to the second text. In response to the fact that the similarity between the first structured information and the second structured information is greater than a preset threshold, the current dialogue and the historical dialogue corresponding to the second structured information are combined to construct the multi-turn dialogue test set.

9. An electronic device, characterized in that, The method includes a memory and a processor coupled to each other, wherein the memory stores program instructions and the processor executes the program instructions to implement the semantic effect evaluation method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program for implementing the semantic effect evaluation method according to any one of claims 1 to 7.