Session management method, apparatus and electronic device
By using intent recognition and dialogue selection models, and leveraging intent flow and response dialogue databases, logically consistent response dialogues are automatically determined. This solves the problems of high workload and illogical consistency in manual configuration during conversation management in complex scenarios, achieving efficient and accurate conversation management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2022-07-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for managing conversations in complex scenarios require extensive manual configuration and are highly likely to select illogical response scripts, resulting in a huge workload and long cycle.
By using intent recognition and dialogue selection models, and leveraging intent flow and response dialogue databases, logically consistent response dialogues can be automatically determined, reducing the workload of manual configuration.
It enables the rapid and accurate selection of logically sound response scripts, reducing the workload and cycle of manual configuration and improving the efficiency of session management.
Smart Images

Figure CN115271024B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to a session management method, apparatus, and electronic device. Background Technology
[0002] Intelligent chatbots are increasingly being used in many business scenarios. One of the core aspects of most complex intelligent chatbots is conversation management, which means that in human-computer dialogue, after receiving information from a human, the robot selects a response based on that information.
[0003] In complex scenarios, configuring conversation management for chatbots is a massive and tedious task. It requires setting corresponding robot responses based on human experience for various possible scenarios during the conversation, and then selecting a response from the robot after receiving the human's statement to complete the conversation management. This results in a huge workload for manual configuration and a high probability of selecting illogical responses when answering statements.
[0004] Therefore, providing an efficient and accurate session management method is an important problem that urgently needs to be solved. Summary of the Invention
[0005] This disclosure provides a conversation management method, apparatus, and electronic device to address the shortcomings of existing technologies that often select illogical response statements when responding to statements to be identified, thereby enabling accurate and rapid selection of logical response statements based on statements to be identified.
[0006] This disclosure provides a session management method, including:
[0007] The acquired statement to be recognized is input into the intent recognition model, and the first intent data is output.
[0008] Based on the first intent data, the subsequent intent data of the target intent data in the corresponding target intent stream data is determined in the intent stream database. The intent stream database consists of multiple intent stream data, and the intent stream data consists of multiple intent data in order. The subsequent intent data is the intent data that is adjacent to the target intent data and is sorted later in the intent stream data.
[0009] Based on the post-intent data, at least one candidate dialogue data is obtained by matching it in the response dialogue database according to a preset mapping relationship.
[0010] The first intent data and at least one of the candidate dialogue data are input into the dialogue selection model, and the target response dialogue data is determined based on the output results.
[0011] According to a session management method provided in this disclosure, the intent recognition model is trained through the following steps:
[0012] Input the sample of the statement to be identified into the pre-built initial intent recognition model, and output the dialogue intent category sample, wherein the sample of the statement to be identified has a pre-labeled dialogue intent category label;
[0013] Calculate the first loss value based on the dialogue intent category sample and the pre-labeled dialogue intent category label;
[0014] The first parameter of the initial intent recognition model is adjusted according to the first loss value, and the initial intent recognition model is updated.
[0015] If the first loss value is greater than the first preset threshold, return to re-execute the step of inputting the sample of the statement to be identified into the pre-built initial intent recognition model and outputting the dialogue intent category sample;
[0016] If the first loss value is less than or equal to the first preset threshold, the updated initial intent recognition model is determined to be the intent recognition model.
[0017] According to a session management method provided in this disclosure, the step of determining the subsequent intent data of the target intent data in the corresponding target intent stream data in the intent stream database based on the first intent data includes:
[0018] Match the corresponding target intent data in the intent stream database based on the first intent data;
[0019] Based on the target intent data, determine the target intent stream data;
[0020] The intent data that is adjacent to the target intent data and is ranked later in the target intent stream data is determined as the subsequent intent data.
[0021] According to a conversation management method provided in this disclosure, the script selection model is trained through the following steps:
[0022] Input the dialogue selection sample into the initial dialogue selection model, and output the sample classification label and the probability corresponding to the sample classification label. The dialogue selection sample includes intent data sample and candidate dialogue data sample, and the dialogue selection sample has pre-labeled pre-classification labels.
[0023] The second loss value is calculated based on the sample classification label, the probability corresponding to the sample classification label, and the pre-labeled classification label;
[0024] The second parameter of the initial speech selection model is adjusted according to the second loss value, and the initial speech selection model is updated.
[0025] If the second loss value is greater than the second preset threshold, return to the step of inputting the speech selection sample into the initial speech selection model and outputting the sample classification label and the probability corresponding to the sample classification label;
[0026] If the second loss value is less than or equal to the second preset threshold, the updated initial script selection model is determined to be the script selection model.
[0027] According to a conversation management method provided in this disclosure, the step of inputting the first intent data and at least one candidate dialogue data into a dialogue selection model, and determining the target response dialogue data based on the output result, includes:
[0028] For each candidate speech data, the first intent data and the candidate speech data are used as a data to be identified, resulting in at least one data to be identified.
[0029] The at least one data to be identified is input into the speech selection model one by one to obtain at least one classification label and the probability corresponding to the classification label;
[0030] According to preset rules, a category label of a specified type is selected as the target response label, and at least one target response label is obtained;
[0031] The target response script data is determined based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data.
[0032] According to a session management method provided in this disclosure, the step of determining target response script data based on the probability corresponding to at least one target response tag and the corresponding candidate script data includes:
[0033] In at least one of the target response labels, the candidate speech data corresponding to the target response label with the highest probability corresponding to the classification label is determined as the target response speech data.
[0034] This disclosure also provides a session management apparatus, including:
[0035] The intent recognition unit is used to input the acquired statement to be recognized into the intent recognition model and output the first intent data.
[0036] The first matching unit is used to determine the subsequent intent data of the target intent data in the corresponding target intent data in the intent stream database according to the first intent data. The intent stream database consists of multiple intent stream data, and the intent stream data consists of multiple intent data in order. The subsequent intent data is the intent data that is adjacent to the target intent data and is sorted later in the intent stream data.
[0037] The second matching unit is used to match the post-intent data in the response script database according to a preset mapping relationship to obtain at least one candidate script data.
[0038] The determining unit is used to input the first intent data and at least one of the candidate dialogue data into the dialogue selection model, and determine the target response dialogue data based on the output result.
[0039] This disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the session management methods described above.
[0040] This disclosure also provides a non-transitory computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of any of the session management methods described above.
[0041] This disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the session management methods described above.
[0042] The session management method, apparatus, and electronic device disclosed herein preliminarily determine the intent of the statement to be identified by inputting it into an intent recognition model, determine logically consistent subsequent intents based on the intent of the statement to be identified, input the subsequent intents and at least one candidate dialogue corresponding to the subsequent intents into a dialogue selection model, and select a logically consistent target response dialogue through the dialogue selection model. The intent flow method from real dialogue data can include as many nodes and paths as possible, which is far greater than the methods of manual configuration and rule summarization, especially covering more long-tail cases. Applying the dialogue selection model to configure the target response dialogue at the node has a high probability of conforming to the logic of the dialogue context, avoiding the problem of difficulty in controlling the dialogue logic during manual configuration, greatly reducing the workload and cycle of manual configuration, and achieving accurate and rapid selection of logically consistent response dialogues based on the statement to be identified. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in this disclosure or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 This is a flowchart illustrating the session management method provided in this disclosure;
[0045] Figure 2 This is a flowchart illustrating the training method of the intent recognition model provided in this disclosure;
[0046] Figure 3 This is a schematic diagram of the training process of the initial intent recognition model provided in this disclosure;
[0047] Figure 4 This is a flowchart illustrating the training method of the publicly provided discourse selection model;
[0048] Figure 5 This is a schematic diagram of the training process of the initial word choice model provided in this publication;
[0049] Figure 6 This is a schematic diagram of the structure of the session management device provided in this disclosure;
[0050] Figure 7 This is a schematic diagram of the structure of the electronic device provided in this disclosure. Detailed Implementation
[0051] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of the embodiments of this disclosure.
[0052] Intelligent chatbots, including intelligent customer service robots, intelligent outbound call robots, and intelligent training robots, are increasingly being used in various business scenarios. One of the core elements of most complex intelligent chatbots is session management. In human-computer dialogue, after receiving information from a human, the robot reacts to that information and provides feedback to the management system. Intelligent chatbots can interact using various methods, including text and voice. However, in most cases, even when using non-text interaction methods such as voice, the information is converted to text and input into the session management system. The session management system then outputs the feedback text, which is then converted back to voice and other formats before being sent back to the user.
[0053] Configuring conversation management for chatbots in complex scenarios is a massive and tedious task. It requires configuring the chatbot's conversation management system from the first round of dialogue until its end, considering all possible scenarios. Common approaches to building conversation management systems for specific scenarios include: 1) Having experienced professionals manually build the system based on their knowledge of the scenario. This involves configuring the chatbot's possible responses for each round of dialogue based on anticipated scenarios, thus creating a conversation management system; 2) Utilizing existing human-to-human dialogue data in the same scenario, statistically analyzing and manually summarizing relevant rules, and configuring possible responses for each round of dialogue, thus creating a chatbot conversation management system.
[0054] In complex scenarios, configuring conversation management for chatbots is a massive and tedious task. It requires setting corresponding robot responses based on human experience for various possible scenarios during the conversation, and then selecting a response from the robot after receiving the human's statement to complete the conversation management. This results in a huge workload for manual configuration and a high probability of selecting illogical responses when answering statements.
[0055] Table 1: Examples of Outbound Customer Service Robot Dialogues
[0056]
[0057] To implement the human-computer dialogue shown in Table 1, in most cases, it requires pre-configuring a session management system. A session management system is typically structured around nodes and paths. Nodes usually include the human's intent and the robot's response to that intent, while paths indicate how these nodes are connected. For the outbound customer service robot scenario in the example in Table 1, the session management system can be configured with two nodes: an initial node (human intent: none; response: Hello!) and another node (human intent: greeting; response: Are you Mr. Zhang San?). During the dialogue, the robot uses the initial node at the beginning (corresponding to the first round in Table 1). The human responds to the robot's "Hello!" with "Hello!" (corresponding to the second round in Table 1), and its intent is "greeting," connecting the two nodes and forming a path within the session management system. However, if the human responds to the robot's response in the first round with a different statement, such as "No," a new node for this situation needs to be pre-configured within the session management system, connecting the initial node and this new node.
[0058] In complex scenarios, the number of dialogue rounds far exceeds two, resulting in a massive amount of space for configuring nodes and paths within the conversation management system. Whether it's manually building the system based on experience by professionals familiar with the specific scenario, or statistically summarizing rules from real-person dialogues in that scenario, there are many serious challenges, including only being able to cover frequently occurring situations, a high probability of generating illogical responses, a huge workload for manual configuration, and a long cycle.
[0059] Therefore, providing an efficient and accurate session management method is an important problem that urgently needs to be solved.
[0060] To address the aforementioned issues, this disclosure provides a session management method, such as... Figure 1 As shown, it includes:
[0061] S11. Input the acquired statement to be recognized into the intent recognition model and output the first intent data.
[0062] S12. Determine the subsequent intent data of the target intent data in the corresponding target intent stream data in the intent stream database based on the first intent data.
[0063] The intent stream database consists of multiple intent stream data, which in turn consists of multiple intent data in sequence. The subsequent intent data is the intent data that is adjacent to the target intent data and is ordered later in the intent stream data.
[0064] Specifically, for a specific conversation scenario, real-person dialogue data can be prepared. Based on the characteristics of the conversation scenario, a category system for conversational intents can be defined, assigning each round of dialogue text to a predefined category. Once the intent categories are defined, the number and definitions of categories will not change. Furthermore, since there are at least two parties in a conversation, when defining categories, a separate set of intent categories can be defined for each party, or all parties can share a single set of intent categories. This transforms the real-person dialogue into intent data corresponding to the corresponding intent categories.
[0065] In one example, the intent data in a graph stream that constitutes the intent stream database can be obtained by identifying dialogue data between real people using the aforementioned intent recognition model. As shown in Table 1, an intent classification model can be used to convert a dialogue into an intent stream composed of multiple intents: "Robot intent: Greeting – Real person intent: Greeting – Robot intent: Confirm identity – Real person intent: Confirm and inquire about purpose – Robot intent: State purpose – Real person intent: Confirm – … – Robot intent: End." After a large number of real person dialogues are converted into intent streams, an intent stream library is formed, and the number of intent streams in the library can be increased at any time.
[0066] S13. Match the post-intent data in the response script database according to the preset mapping relationship to obtain at least one candidate script data.
[0067] Specifically, for a given conversational scenario, after the robot's response intent categories and definitions are determined, a set of response scripts for each intent is constructed, forming a response script database. Response scripts are typically manually compiled from prepared human dialogue data. The number of scripts for each intent category is customizable, depending on the specific conversational scenario, the system definition of intent categories, and the amount of human dialogue data. Specific scripts in the response script database can be added at any time.
[0068] S14. Input the first intent data and at least one of the candidate dialogue data into the dialogue selection model, and determine the target response dialogue data based on the output results.
[0069] In this embodiment, the intent of the statement to be identified is initially determined by inputting it into an intent recognition model. Based on this intent, a logically consistent subsequent intent is determined. The subsequent intent and at least one corresponding candidate phrase are then input into a phrase selection model. The phrase selection model selects a logically consistent target response phrase. Using intent streams from real dialogue data can include as many nodes and paths as possible, significantly exceeding the scale of manual configuration and rule summarization, especially covering more long-tail scenarios. Applying a phrase selection model to configure target response phrases at nodes greatly increases the likelihood of conforming to the dialogue context, avoiding the significant difficulty in controlling dialogue logic during manual configuration. This greatly reduces the workload and time required for manual configuration, enabling accurate and rapid selection of logically consistent response phrases based on the statement to be identified.
[0070] According to the session management method provided in this disclosure, such as Figure 2 As shown, the intent recognition model is trained through the following steps:
[0071] S21. Input the sample of the statement to be identified into the pre-built initial intent recognition model, and output the dialogue intent category sample.
[0072] The sample statement to be identified has a pre-labeled dialogue intent category label.
[0073] Specifically, such as Figure 3 As shown, the structure of the pre-built initial intent recognition model is a classification model. It can adopt, but is not limited to, commonly used classification model structures such as TextCNN, BiLSTM, or BERT. It takes a sample of the statement to be recognized as input and outputs the most probable dialogue intent category among predefined intent categories, i.e., the dialogue intent category sample. For example, if the sample of the statement to be recognized is "OK", the initial intent recognition model will output the dialogue intent category sample as "Confirm".
[0074] S22. Calculate the first loss value based on the dialogue intent category sample and the pre-labeled dialogue intent category label.
[0075] S23. Adjust the first parameter of the initial intent recognition model according to the first loss value, and update the initial intent recognition model.
[0076] S24. If the first loss value is greater than the first preset threshold, return to re-execute the step of inputting the sample of the statement to be identified into the pre-built initial intent recognition model and outputting the dialogue intent category sample.
[0077] Specifically, when the first loss value is greater than the first preset threshold, the operation of steps S21-S23 can be returned and re-executed.
[0078] S25. If the first loss value is less than or equal to the first preset threshold, the updated initial intent recognition model is determined to be the intent recognition model.
[0079] This disclosure describes in detail the construction principle of the pre-built initial intent recognition model, and how to train the initial intent recognition model based on the sample of the statement to be recognized, ultimately obtaining a high-performance intent recognition model. This facilitates efficient and accurate recognition of the statement to be recognized based on the intent recognition model, obtaining the first intent data.
[0080] According to the session management method provided in this disclosure, step S12 specifically includes:
[0081] S121. Match the corresponding target intent data in the intent stream database according to the first intent data.
[0082] S122. Determine the target intent stream data based on the target intent data.
[0083] S123. The intent data that is adjacent to the target intent data and is sorted later in the target intent stream data is determined as the subsequent intent data.
[0084] Specifically, since the intent stream database consists of multiple intent stream data, and each intent stream data consists of multiple intent data in a specific order, the first intent data can be matched with corresponding intent data in the intent stream database as the target intent data. Based on the target intent data, the intent stream data containing the target intent data is determined as the target intent stream data. In the target intent stream data, the intent data that is adjacent to the target intent data and is ranked later is searched and determined as the subsequent intent data.
[0085] In this embodiment of the disclosure, target intent data is matched in the intent stream database based on the first intent data, target intent stream data is determined based on the target intent data, and subsequent intent data is determined based on the target intent data and the target intent data. Since each intent data in the intent stream data is obtained by logical organization and sorting, the dialogue intent corresponding to the determined subsequent intent data is logically consistent with the dialogue intent to be represented by the first intent data in the preceding dialogue, thereby realizing the determination of logically consistent subsequent intent data.
[0086] According to the session management method provided in this disclosure, such as Figure 4 As shown, the script selection model is trained through the following steps:
[0087] S41. Input the dialogue selection sample into the initial dialogue selection model, and output the sample classification label and the probability corresponding to the sample classification label.
[0088] The dialogue selection sample includes intent data samples and candidate dialogue data samples, and the dialogue selection sample has pre-labeled pre-classification labels.
[0089] Specifically, a schematic diagram of the initial word choice model training process can be found here. Figure 5 The input is a string composed of dialogue selection samples, which include intent data samples and candidate dialogue data samples. The dialogue selection samples are segmented into words or characters and then input into the model. The initial dialogue selection model is a classification model, which can adopt, but is not limited to, commonly used classification model structures such as TextCNN, BiLSTM, or BERT. The output is the sample classification label and the probability of the sample classification label.
[0090] As shown in Table 1, if there have been 1-4 rounds of dialogue, determine whether the wording chosen in the 5th round is appropriate. The intent flow of the 1-4 rounds of dialogue is: Robot intent: Greeting – Human intent: Greeting – Robot intent: Confirm identity – Human intent: Confirm and inquire about purpose. If the wording chosen in the 5th round is "This is customer service. You have processed a transaction with us and would like to confirm some relevant information with you," it conforms to the current dialogue logic and is marked as appropriate. Conversely, if the wording from the 3rd round is repeated, "Are you Mr. Zhang San?", it does not conform to the current dialogue logic and is marked as inappropriate. For a specific conversation scenario, using dialogue data between real people, a corpus, i.e., wording selection samples, can be constructed to train the initial wording selection model.
[0091] The selection of dialogue corpus in the dialogue selection sample can be decided based on experience. It can be either using the complete intent flow from the beginning of the dialogue to the current moment, or only using a part of the complete intent flow, i.e., the last intent.
[0092] The script selection sample includes intent data sample and candidate script data sample. The script selection sample has pre-labeled pre-classification tags, which can be set as needed. In one example, the pre-classification tags may include "logical" and "illogical" tags. In another example, the pre-classification tags may also include tags such as "confirm identity" and "state purpose".
[0093] S42. Calculate the second loss value based on the sample classification label, the probability corresponding to the sample classification label, and the pre-labeled classification label.
[0094] S43. Adjust the second parameter of the initial speech selection model according to the second loss value, and update the initial speech selection model.
[0095] S44. If the second loss value is greater than the second preset threshold, return to and re-execute the step of inputting the speech selection sample into the initial speech selection model and outputting the sample classification label and the probability corresponding to the sample classification label.
[0096] Specifically, when the second loss value is greater than the second preset threshold, the operation of steps S41-S43 can be re-executed.
[0097] S45. If the second loss value is less than or equal to the second preset threshold, the updated initial script selection model is determined as the script selection model.
[0098] This disclosure describes in detail the construction principle of the pre-built initial dialogue selection model, and how to train the initial dialogue selection model based on dialogue selection samples to obtain a dialogue selection model with better performance. This facilitates the efficient and accurate identification of the first intent data and candidate dialogue data based on the dialogue selection model to obtain the target response dialogue data.
[0099] According to the session management method provided in this disclosure, step S14 may specifically include:
[0100] S141. For each candidate speech data, the first intent data and the candidate speech data are taken as a data to be identified, and at least one data to be identified is obtained.
[0101] Specifically, in one example, a candidate speech data is obtained based on the first intent data and the preset mapping relationship, and the first intent data and the candidate speech data are used as a data to be identified.
[0102] In another example, multiple candidate dialogue data are obtained based on the first intent data and a preset mapping relationship. Each candidate dialogue data and the first intent data are treated as a data to be identified, resulting in multiple data to be identified.
[0103] S142. Input the at least one data to be identified into the speech selection model one by one to obtain at least one classification label and the probability corresponding to the classification label.
[0104] Specifically, in one example, there are multiple data to be identified. For each data to be identified, the data is input into the speech selection model to obtain the classification label of the data to be identified and the probability of being identified as that classification label.
[0105] S143. Select the category label of the specified category as the target response label according to the preset rules to obtain at least one target response label.
[0106] Specifically, in one example, multiple "logically correct" category labels and multiple "illogically incorrect" category labels are obtained through step S142. The "logically correct" category labels can be selected as the target response labels according to preset rules.
[0107] In another example, multiple "confirm identity" category labels and multiple "state purpose" category labels are obtained through step S142. The "state purpose" category label can be selected as the target response label according to preset rules.
[0108] S144. Determine the target response script data based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data.
[0109] Specifically, in one example, after selecting the "logically consistent" category label as the target response label according to the preset rules, the target response speech data can be further determined from the candidate speech data based on the probabilities corresponding to multiple "logically consistent" category labels.
[0110] In another example, after selecting the "indicate purpose" category label as the target response label according to preset rules, the target response speech data can be further determined from the candidate speech data based on the probabilities corresponding to multiple "indicate purpose" category labels.
[0111] In this embodiment, each candidate dialogue data and the first intent data are combined to obtain at least one data to be identified. This at least one data to be identified is then input into a dialogue selection model to obtain at least one classification label and its corresponding probability. A target response label is determined from the classification labels according to preset rules. Finally, target response dialogue data is determined from the selected dialogue data based on the probability corresponding to the classification label within the target response label. Accurate target response dialogue data can be obtained by determining both the target response label and the probability corresponding to the classification label.
[0112] According to the session management method provided in this disclosure, step S144 specifically includes:
[0113] S145. Among at least one of the target response tags, the candidate speech data corresponding to the target response tag with the highest probability corresponding to the classification tag is determined as the target response speech data.
[0114] Specifically, in one example, after selecting the "logically correct" category label as the target response label, there are still multiple "logically correct" category labels A, B, and C. The probabilities of their corresponding "logically correct" category labels are 0.8, 0.6, and 0.6, respectively. At this time, the probability of A corresponding to the "logically correct" category label is 0.8, which is the highest probability among A, B, and C. In this case, the candidate speech data corresponding to A is used as the target response speech data.
[0115] In another example, the dialogue selection model yields three categories: D for "confirming identity," E for "stating purpose," and F for "stating purpose." The probability of category D is 0.9, E is 0.8, and F is 0.7. If "stating purpose" is selected as the target response label according to preset rules, even if the probability of category D is higher than that of categories E and F, since D does not belong to the "stating purpose" category, the probabilities of categories E and F are compared based on the target response label "stating purpose." Because the probability of category E is higher than that of F, the candidate dialogue data corresponding to category E is selected as the target response dialogue data.
[0116] In this embodiment of the disclosure, among at least one target response label, the candidate speech data corresponding to the target response label with the highest probability corresponding to the classification label is determined as the target response speech data, so that the target speech data is the optimal selection under the target response label, and the obtained target speech data has good logic.
[0117] The session management device provided in the embodiments of this disclosure is described below. The session management device described below can be referred to in correspondence with the session management method described above.
[0118] This disclosure also provides a session management device, such as Figure 6 As shown, it includes:
[0119] The intent recognition unit 61 is used to input the acquired statement to be recognized into the intent recognition model and output the first intent data.
[0120] The first matching unit 62 is used to determine the subsequent intent data of the target intent data in the corresponding target intent stream data in the intent stream database based on the first intent data.
[0121] The intent stream database consists of multiple intent stream data, which in turn consists of multiple intent data in sequence. The subsequent intent data is the intent data that is adjacent to the target intent data and is ordered later in the intent stream data.
[0122] The second matching unit 63 is used to match the post-intent data in the response script database according to a preset mapping relationship to obtain at least one candidate script data.
[0123] The determining unit 64 is used to input the first intent data and at least one of the candidate dialogue data into the dialogue selection model, and determine the target response dialogue data based on the output result.
[0124] In this embodiment, the intent of the statement to be identified is initially determined by inputting it into an intent recognition model. Based on this intent, a logically consistent subsequent intent is determined. The subsequent intent and at least one corresponding candidate phrase are then input into a phrase selection model. The phrase selection model selects a logically consistent target response phrase. Using intent streams from real dialogue data can include as many nodes and paths as possible, significantly exceeding the scale of manual configuration and rule summarization, especially covering more long-tail scenarios. Applying a phrase selection model to configure target response phrases at nodes greatly increases the likelihood of conforming to the dialogue context, avoiding the significant difficulty in controlling dialogue logic during manual configuration. This greatly reduces the workload and time required for manual configuration, enabling accurate and rapid selection of logically consistent response phrases based on the statement to be identified.
[0125] According to embodiments of this disclosure, a session management device is provided, wherein the intent recognition model is trained through the following steps:
[0126] Input the sample of the statement to be identified into the pre-built initial intent recognition model, and output the dialogue intent category sample, wherein the sample of the statement to be identified has a pre-labeled dialogue intent category label;
[0127] Calculate the first loss value based on the dialogue intent category sample and the pre-labeled dialogue intent category label;
[0128] The first parameter of the initial intent recognition model is adjusted according to the first loss value, and the initial intent recognition model is updated.
[0129] If the first loss value is greater than the first preset threshold, return to re-execute the step of inputting the sample of the statement to be identified into the pre-built initial intent recognition model and outputting the dialogue intent category sample;
[0130] If the first loss value is less than or equal to the first preset threshold, the updated initial intent recognition model is determined to be the intent recognition model.
[0131] According to an embodiment of the present disclosure, a session management device is provided, wherein a first matching unit 62 is specifically used for:
[0132] Match the corresponding target intent data in the intent stream database based on the first intent data;
[0133] Based on the target intent data, determine the target intent stream data;
[0134] The intent data that is adjacent to the target intent data and is ranked later in the target intent stream data is determined as the subsequent intent data.
[0135] According to embodiments of this disclosure, a conversation management device is provided, wherein the speech selection model is trained through the following steps:
[0136] Input the dialogue selection sample into the initial dialogue selection model, and output the sample classification label and the probability corresponding to the sample classification label. The dialogue selection sample includes intent data sample and candidate dialogue data sample, and the dialogue selection sample has pre-labeled pre-classification labels.
[0137] The second loss value is calculated based on the sample classification label, the probability corresponding to the sample classification label, and the pre-labeled classification label;
[0138] The second parameter of the initial speech selection model is adjusted according to the second loss value, and the initial speech selection model is updated.
[0139] If the second loss value is greater than the second preset threshold, return to the step of inputting the speech selection sample into the initial speech selection model and outputting the sample classification label and the probability corresponding to the sample classification label;
[0140] If the second loss value is less than or equal to the second preset threshold, the updated initial script selection model is determined to be the script selection model.
[0141] According to an embodiment of this disclosure, a session management apparatus is provided, wherein the determining unit 64 is specifically used for:
[0142] For each candidate speech data, the first intent data and the candidate speech data are used as a data to be identified, resulting in at least one data to be identified.
[0143] The at least one data to be identified is input into the speech selection model one by one to obtain at least one classification label and the probability corresponding to the classification label;
[0144] According to preset rules, a category label of a specified type is selected as the target response label, and at least one target response label is obtained;
[0145] The target response script data is determined based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data.
[0146] According to an embodiment of this disclosure, a session management apparatus is provided, wherein the determining unit 64 is specifically used for:
[0147] In at least one of the target response labels, the candidate speech data corresponding to the target response label with the highest probability corresponding to the classification label is determined as the target response speech data.
[0148] Figure 7 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 7 As shown, the electronic device may include: a processor 710, a communications interface 720, a memory 730, and a communication bus 740, wherein the processor 710, the communications interface 720, and the memory 730 communicate with each other through the communication bus 740. The processor 710 can call logical instructions in the memory 730 to execute a session management method, which includes: inputting the acquired statement to be recognized into an intent recognition model and outputting first intent data; determining, based on the first intent data, subsequent intent data of the target intent data in the corresponding target intent stream data in an intent stream database, wherein the intent stream database consists of multiple intent stream data, each intent stream data consisting of multiple intent data in sequence, and the subsequent intent data being the intent data adjacent to and ranked later in the intent stream data; matching the subsequent intent data in the response script database according to a preset mapping relationship to obtain at least one candidate script data; inputting the first intent data and at least one candidate script data into a script selection model, and determining the target response script data based on the output result.
[0149] Furthermore, the logical instructions in the aforementioned memory 730 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of this disclosure, essentially, or the parts that contribute to the prior art, or parts of the technical solutions, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0150] On the other hand, this disclosure also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the session management method provided by the above methods. The method includes: inputting an acquired statement to be recognized into an intent recognition model and outputting first intent data; determining, based on the first intent data, subsequent intent data of the target intent data in a corresponding target intent stream data in an intent stream database, wherein the intent stream database consists of multiple intent stream data, the intent stream data consists of multiple intent data in order, and the subsequent intent data is the intent data that is adjacent to the target intent data and is ordered later in the intent stream data; matching, based on the subsequent intent data, in a response script database according to a preset mapping relationship to obtain at least one candidate script data; inputting the first intent data and at least one candidate script data into a script selection model, and determining the target response script data based on the output result.
[0151] In another aspect, this disclosure also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the aforementioned session management methods. The method includes: inputting an acquired statement to be recognized into an intent recognition model and outputting first intent data; determining, based on the first intent data, subsequent intent data of the target intent data in a corresponding target intent stream data in an intent stream database, wherein the intent stream database consists of multiple intent stream data, each intent stream data being composed of multiple intent data in a specific order, and the subsequent intent data being intent data adjacent to and ranked later in the intent stream data; matching, based on the subsequent intent data, in a response script database according to a preset mapping relationship to obtain at least one candidate script data; inputting the first intent data and at least one candidate script data into a script selection model, and determining the target response script data based on the output result.
[0152] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0153] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0154] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure.
Claims
1. A session management method, characterized in that, include: The acquired statement to be recognized is input into the intent recognition model, and the first intent data is output. Based on the first intent data, the subsequent intent data of the target intent data in the corresponding target intent stream data is determined in the intent stream database. The intent stream database consists of multiple intent stream data, and the intent stream data consists of multiple intent data in order. The subsequent intent data is the intent data that is adjacent to the target intent data and is sorted later in the intent stream data. Based on the post-intent data, at least one candidate dialogue data is obtained by matching it in the response dialogue database according to a preset mapping relationship. The first intent data and at least one of the candidate dialogue data are input into the dialogue selection model, and the target response dialogue data is determined based on the output results. The step of inputting the first intent data and at least one of the candidate dialogue data into the dialogue selection model, and determining the target response dialogue data based on the output results, includes: For each candidate speech data, the first intent data and the candidate speech data are used as a data to be identified, resulting in at least one data to be identified. The at least one data to be identified is input into the speech selection model one by one to obtain at least one classification label and the probability corresponding to the classification label; According to preset rules, a category label of a specified type is selected as the target response label, and at least one target response label is obtained; The target response script data is determined based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data.
2. The session management method according to claim 1, characterized in that, The intent recognition model is trained through the following steps: Input the sample of the statement to be identified into the pre-built initial intent recognition model, and output the dialogue intent category sample, wherein the sample of the statement to be identified has a pre-labeled dialogue intent category label; Calculate the first loss value based on the dialogue intent category sample and the pre-labeled dialogue intent category label; The first parameter of the initial intent recognition model is adjusted according to the first loss value, and the initial intent recognition model is updated. If the first loss value is greater than the first preset threshold, return to re-execute the step of inputting the sample of the statement to be identified into the pre-built initial intent recognition model and outputting the dialogue intent category sample; If the first loss value is less than or equal to the first preset threshold, the updated initial intent recognition model is determined to be the intent recognition model.
3. The session management method according to claim 2, characterized in that, The step of determining the subsequent intent data of the target intent data in the corresponding target intent stream data in the intent stream database based on the first intent data includes: Match the corresponding target intent data in the intent stream database based on the first intent data; Based on the target intent data, determine the target intent stream data; The intent data that is adjacent to the target intent data and is ranked later in the target intent stream data is determined as the subsequent intent data.
4. The session management method according to claim 3, characterized in that, The script selection model is trained through the following steps: Input the dialogue selection sample into the initial dialogue selection model, and output the sample classification label and the probability corresponding to the sample classification label. The dialogue selection sample includes intent data sample and candidate dialogue data sample, and the dialogue selection sample has pre-labeled pre-classification labels. The second loss value is calculated based on the sample classification label, the probability corresponding to the sample classification label, and the pre-labeled classification label; The second parameter of the initial speech selection model is adjusted according to the second loss value, and the initial speech selection model is updated. If the second loss value is greater than the second preset threshold, return to the step of inputting the speech selection sample into the initial speech selection model and outputting the sample classification label and the probability corresponding to the sample classification label; If the second loss value is less than or equal to the second preset threshold, the updated initial script selection model is determined to be the script selection model.
5. The session management method according to claim 1, characterized in that, The step of determining the target response script data based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data includes: In at least one of the target response labels, the candidate speech data corresponding to the target response label with the highest probability corresponding to the classification label is determined as the target response speech data.
6. A session management device, characterized in that, include: The intent recognition unit is used to input the acquired statement to be recognized into the intent recognition model and output the first intent data. The first matching unit is used to determine the subsequent intent data of the target intent data in the corresponding target intent data in the intent stream database according to the first intent data. The intent stream database consists of multiple intent stream data, and the intent stream data consists of multiple intent data in order. The subsequent intent data is the intent data that is adjacent to the target intent data and is sorted later in the intent stream data. The second matching unit is used to match the post-intent data in the response script database according to a preset mapping relationship to obtain at least one candidate script data. A determining unit is configured to input the first intent data and at least one of the candidate dialogue data into a dialogue selection model, and determine the target response dialogue data based on the output result. Determine the unit, specifically for: For each candidate speech data, the first intent data and the candidate speech data are used as a data to be identified, resulting in at least one data to be identified. The at least one data to be identified is input into the speech selection model one by one to obtain at least one classification label and the probability corresponding to the classification label; According to preset rules, a category label of a specified type is selected as the target response label, and at least one target response label is obtained; The target response script data is determined based on the probability corresponding to at least one of the target response tags and the corresponding candidate script data.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the session management method as described in any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the session management method as described in any one of claims 1 to 5.
9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the session management method as described in any one of claims 1 to 5.