Intention recognition method and device, electronic equipment, storage medium and program product

By identifying the semantic relationship and intent changes between the current round and the previous round of questions in a multi-turn dialogue, the target intent is generated, which solves the problem of inaccurate intent recognition in multi-turn dialogue scenarios and improves recognition accuracy and user experience.

CN122311368APending Publication Date: 2026-06-30BEIJING XIAOMI MOBILE SOFTWARE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-30

Smart Images

  • Figure CN122311368A_ABST
    Figure CN122311368A_ABST
Patent Text Reader

Abstract

This disclosure relates to an intent recognition method, apparatus, electronic device, storage medium, and program product. The intent recognition method includes: acquiring the content of a first question asked at the Nth time and the content of a second question asked at the (N-1)th time, where N is an integer greater than 1; determining, based on the first question content and the second question content, whether a semantic association exists between the first question content and the second question content, and determining whether a first intent and a second intent are the same, wherein the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content; in response to the existence of a semantic association between the first question content and the second question content, and the first intent and the second intent being different, generating a target intent based on the first intent and the second question content, and using the target intent as the intent for the Nth question. The method provided by the embodiments of this disclosure can improve the accuracy of intent recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence, and in particular to methods, apparatuses, electronic devices, storage media, and program products for intent recognition. Background Technology

[0002] In the field of machine learning, especially in Natural Language Processing (NLP), multi-turn dialogue scenarios are quite common. Taking task-oriented dialogue as an example, in order to efficiently assist users in completing tasks or achieving goals in multi-turn dialogue scenarios, it is usually necessary to identify the user's intent when asking questions in the scenario.

[0003] In related technologies, there is a problem of inaccurate recognition results for intent recognition in multi-turn dialogues. Summary of the Invention

[0004] To overcome the problems existing in related technologies, this disclosure provides an intent recognition method, apparatus, electronic device, storage medium, and program product.

[0005] According to a first aspect of the present disclosure, an intent recognition method is provided, comprising: acquiring a first question content of an Nth question and a second question content of an (N-1)th question, wherein N is an integer greater than 1; determining, based on the first question content and the second question content, whether there is a semantic association between the first question content and the second question content, and determining whether a first intent and a second intent are the same, wherein the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content; in response to the existence of a semantic association between the first question content and the second question content, and the first intent being different from the second intent, generating a target intent based on the first intent and the second question content, and using the target intent as the intent of the Nth question.

[0006] In one implementation, determining whether there is a semantic relationship between the first question content and the second question content, and determining whether the first intent and the second intent are the same, includes: obtaining the content of the (N-1)th question and answer, wherein the content of the (N-1)th question and answer includes the second question content and the corresponding response content; invoking a target model to extract semantic features from the first question content and the content of the (N-1)th question and answer, obtaining a first semantic feature corresponding to the first question content and a second semantic feature corresponding to the content of the (N-1)th question and answer; and based on the first semantic feature and / or the second semantic feature, determining whether there is a semantic relationship between the first question content and the second question content, and determining whether the first intent and the second intent are the same.

[0007] In one implementation, the method further includes: in response to determining that there is a semantic association between the first question content and the second question content, and that the first intent is the same as the second intent, determining the first intent or the second intent as the intent of the user's Nth question.

[0008] In one implementation, the target model is trained as follows: training samples are input into an initial model, wherein each training sample includes a question-answer pair, the question-answer pair including a question and a corresponding response, each training sample corresponds to an intent, and the training samples are obtained based on question-answer pair segmentation from multi-turn dialogues; semantic features are extracted from the training samples to obtain semantic features corresponding to the training samples; the initial model is invoked to classify the training samples based on the semantic features corresponding to the training samples, wherein the classification of the training samples includes semantic classification and / or intent classification; based on the classification results of the training samples by the initial model, a corresponding loss function is determined; and the initial model is adjusted based on the loss function to obtain the target model.

[0009] In one implementation, generating a target intent based on the first intent and the second question content includes: calling a preset Large Language Model (LLM) to rewrite the first question content based on the second question content to obtain rewritten question content; determining the intent corresponding to the rewritten question content, and identifying the intent as the target intent.

[0010] In one implementation, after obtaining the first question content of the Nth question and the second question content of the (N-1)th question input by the user, the method further includes: determining the question-and-answer mode of the Nth question based on the first question content, and determining the question-and-answer mode of the Nth question as another mode, wherein the other question-and-answer mode is different from the preset question-and-answer mode.

[0011] In one implementation, determining the question-and-answer pattern of the Nth question based on the content of the first question includes: determining a first keyword in the content of the first question; if the first keyword is a keyword in a preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as a preset question-and-answer pattern; if the first keyword is not a keyword in the preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as another question-and-answer pattern, wherein the other question-and-answer pattern is different from the preset question-and-answer pattern.

[0012] In one implementation, the preset question-and-answer pattern corresponds to a preset intent correction rule. Determining the question-and-answer pattern for the Nth question based on the content of the first question includes: if the question-and-answer pattern is the preset question-and-answer pattern, then determining the intent correction rule corresponding to the preset question-and-answer pattern; correcting the first intent corresponding to the content of the first question based on the intent correction rule, and using the corrected intent as the target intent.

[0013] According to a second aspect of the present disclosure, an intent recognition device is provided, comprising: an acquisition unit, configured to acquire a first question content of an Nth question and a second question content of an (N-1)th question, wherein N is an integer greater than 1; and a processing unit, configured to determine, based on the first question content and the second question content, whether there is a semantic association between the first question content and the second question content, and whether a first intent and a second intent are the same, and, if there is a semantic association between the first question content and the second question content, and the first intent and the second intent are different, generate a target intent based on the first intent and the second question content, and use the target intent as the intent of the Nth question; wherein the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content.

[0014] In one implementation, the processing unit determines whether there is a semantic relationship between the first question content and the second question content, and whether the first intent and the second intent are the same, by: obtaining the content of the (N-1)th question and answer, wherein the content of the (N-1)th question and answer includes the second question content and the corresponding response content; calling the target model to extract semantic features from the first question content and the content of the (N-1)th question and answer, obtaining the first semantic feature corresponding to the first question content and the second semantic feature corresponding to the content of the (N-1)th question and answer; and based on the first semantic feature and / or the second semantic feature, determining whether there is a semantic relationship between the first question content and the second question content, and determining whether the first intent and the second intent are the same.

[0015] In one embodiment, the processing unit is further configured to: determine the first intention or the second intention as the intention of the user's Nth question if it is determined that there is a semantic association between the first question content and the second question content, and the first intention is the same as the second intention.

[0016] In one implementation, the target model is trained as follows: training samples are input into an initial model, wherein each training sample includes a question-answer pair, the question-answer pair including a question and a corresponding response, each training sample corresponds to an intent, and the training samples are obtained based on question-answer pair segmentation from multi-turn dialogues; semantic features are extracted from the training samples to obtain semantic features corresponding to the training samples; the initial model is invoked to classify the training samples based on the semantic features corresponding to the training samples, wherein the classification of the training samples includes semantic classification and / or intent classification; based on the classification results of the training samples by the initial model, a corresponding loss function is determined; and the initial model is adjusted based on the loss function to obtain the target model.

[0017] In one implementation, the processing unit generates a target intent based on the first intent and the second question content in the following manner: it calls a preset Large Language Model (LLM) to rewrite the first question content based on the second question content to obtain the rewritten question content; it determines the intent corresponding to the rewritten question content and identifies the intent as the target intent.

[0018] In one implementation, the processing unit is further configured to: determine the question-and-answer pattern of the Nth question based on the content of the first question, and determine that the question-and-answer pattern of the Nth question is another pattern, wherein the other question-and-answer pattern is different from the preset question-and-answer pattern.

[0019] In one implementation, the processing unit determines the question-and-answer pattern of the Nth question based on the content of the first question in the following manner: determining a first keyword in the content of the first question; if the first keyword is a keyword in a preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as a preset question-and-answer pattern; if the first keyword is not a keyword in a preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as another question-and-answer pattern, wherein the other question-and-answer pattern is different from the preset question-and-answer pattern.

[0020] In one implementation, when the preset question-and-answer pattern corresponds to a preset intent correction rule, the processing unit determines the question-and-answer pattern of the Nth question based on the content of the first question in the following manner: when the question-and-answer pattern is the preset question-and-answer pattern, determine the intent correction rule corresponding to the preset question-and-answer pattern; correct the first intent corresponding to the content of the first question based on the intent correction rule, and take the corrected intent as the target intent.

[0021] According to a third aspect of the present disclosure, an electronic device is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: execute the intent recognition method described in the first aspect or any embodiment of the first aspect.

[0022] According to a fourth aspect of the present disclosure, a storage medium is provided, the storage medium storing instructions that, when executed by a processor, enable the processor to perform the intent recognition method described in the first aspect or any embodiment of the first aspect.

[0023] According to a fifth aspect of the present disclosure, a computer program product is provided, the computer program product including a computer program that, when executed by a processor, implements the intent recognition method described in the first aspect or any embodiment of the first aspect.

[0024] The technical solutions provided by the embodiments of this disclosure can include the following beneficial effects: Since the first question content of the Nth question and the second question content of the (N-1)th question are two adjacent question contents, determining whether there is a semantic relationship between the first and second question contents can determine whether there is a semantic relationship between two adjacent question contents. This allows for the determination of whether the target intent needs to be determined based on the (N-1)th question content. Furthermore, by determining whether the intents of the first and second question contents are the same, it can be determined whether the intent has changed during multi-turn dialogue, improving the accuracy of target intent recognition. This makes the intent determination result of the Nth question content more accurate, providing users with a better dialogue experience.

[0025] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0026] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0027] Figure 1 This is a flowchart illustrating an intent recognition method according to an exemplary embodiment.

[0028] Figure 2A This is a flowchart illustrating a method for applying a target model according to an exemplary embodiment.

[0029] Figure 2B This is a schematic diagram illustrating a scenario of a target model application according to an exemplary embodiment.

[0030] Figure 3A This is a flowchart illustrating a method for training a target model according to an exemplary embodiment.

[0031] Figure 3B This is a schematic diagram of the architecture of an initial model according to an exemplary embodiment.

[0032] Figure 4A This is a flowchart illustrating a method for generating a target intent according to an exemplary embodiment.

[0033] Figure 4B This is a schematic diagram illustrating a scenario for determining a target intent according to an exemplary embodiment.

[0034] Figure 4C This is a schematic diagram illustrating a method for determining rewritten question content according to an exemplary embodiment.

[0035] Figure 5This is a flowchart illustrating a method for determining a target intent according to an exemplary embodiment.

[0036] Figure 6A This is a flowchart illustrating a method for determining the question-and-answer pattern of the Nth question, according to an exemplary embodiment.

[0037] Figure 6B This is a schematic diagram of an intent recognition scenario according to an exemplary embodiment.

[0038] Figure 7 This is a block diagram of an intent recognition device according to an exemplary embodiment. Figure 1 .

[0039] Figure 8 This is a block diagram two illustrating an intent recognition device according to an exemplary embodiment.

[0040] Figure 9 This is block diagram three illustrating an intent recognition device according to an exemplary embodiment. Detailed Implementation

[0041] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure.

[0042] In machine learning, particularly in natural language processing, multi-turn dialogue scenarios are common. Examples include task-oriented dialogue. Task-oriented dialogue refers to a specific type of dialogue pattern whose purpose is to assist users in completing a specific task or goal by recognizing their dialogue intent.

[0043] Taking task-oriented dialogue as an example, in multi-turn dialogue scenarios, to efficiently assist users in completing tasks or achieving goals, it is usually necessary to identify the user's intent when asking questions. For example, related technologies typically implement this based on methods such as A1 or B1:

[0044] A1) Directly utilize fixed historical dialogue information (e.g., N fixed segments of historical dialogue information, where N is an integer greater than or equal to 1) to identify the user's intent in asking questions. B1) Encode all historical dialogue information using relevant memory networks, such as Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTMs), to achieve the identification of the user's intent in asking questions.

[0045] However, for A1), the fixed historical dialogue information may miss dialogue information relevant to the current question, leading to inaccurate identification of user intent. For B1), encoding all historical dialogues, while reducing the possibility of missing dialogue information relevant to the current question, may introduce more noise, thus also resulting in inaccurate identification of user intent.

[0046] Based on this, to solve the aforementioned technical problems, this disclosure proposes an intent recognition method. This method identifies the semantic relationship between the current round's question content and the previous round's question content, and determines whether the intent of the current round's question has changed compared to the previous round. Based on the identification results, the method determines the intent of the current round's question. On one hand, by considering the identified semantic relationship and intent change results, it determines whether it's necessary to combine the previous round's question content to determine the intent of the current round. Therefore, it reduces the possibility of increased noise due to the introduction of irrelevant historical information during intent recognition, improving the accuracy of the intent recognition results. On the other hand, by determining the intent corresponding to the current round's question content and the intent of the previous round's question, and further determining whether the intent corresponding to the current round's question content has changed compared to the previous round's question content, it reduces the possibility of missing key information in historical dialogues due to intent changes, further improving the accuracy of the intent recognition results.

[0047] It should be noted that the intent recognition method provided in this disclosure can be applied to devices. Devices may include, for example, terminals or servers. Terminals include, but are not limited to, at least one of the following: mobile phone, wearable device, Internet of Things device, car with communication capabilities, smart car, tablet computer, computer with wireless transceiver capabilities, virtual reality (VR) terminal device, augmented reality (AR) terminal device, wireless terminal device in industrial control, wireless terminal device in self-driving, wireless terminal device in remote medical surgery, wireless terminal device in smart grid, wireless terminal device in transportation safety, wireless terminal device in smart city, and wireless terminal device in smart home. Servers may include, but are not limited to, independent physical servers, server clusters or distributed systems composed of multiple physical servers, or cloud servers that provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0048] The embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, and assisted driving.

[0049] The various data involved in the embodiments of this application (such as the content of the first question, the content of the second question, training samples, or various models, etc.) can be stored in the blockchain so that the data demander can obtain them, and the data trustworthiness is guaranteed by the immutability mechanism of the blockchain.

[0050] It should be noted that all actions involving the acquisition of signals, information, or data in this disclosure are carried out in compliance with the relevant data protection laws and policies of the country where the location is situated, and with authorization from the owner of the relevant device.

[0051] In the embodiments of this disclosure, terms such as “in response to…”, “in response to determining…”, “in the case of…”, “when…”, “if…”, “if…”, etc., can be used interchangeably.

[0052] In the embodiments disclosed herein, terms such as “greater than”, “greater than or equal to”, “not less than”, “more than”, “more than or equal to”, “not less than”, “higher than”, “higher than or equal to”, “not lower than”, and “above” can be substituted for each other, and terms such as “less than”, “less than or equal to”, “not greater than”, “less than”, “less than or equal to”, “not more than”, “lower than”, “lower than or equal to”, “not higher than”, and “below” can be substituted for each other.

[0053] For ease of understanding, some technical terms involved in the embodiments of this disclosure will be explained by way of example below:

[0054] Intent: In the fields of NLP and Artificial Intelligence (AI), intent can be understood as the purpose or goal behind user input. It can typically be processed using algorithms related to intent recognition.

[0055] Semantic association relationship: In the field of NLP, semantic association relationship refers to the semantic connection between different linguistic units (such as words, phrases, and sentences). These associations can be direct or indirect, and they help us understand the deeper meanings and structures in language. In embodiments of this disclosure, semantic association relationships may include at least one of the following: synonymy, antonymy, hyponymy, hypernymy, meronymy, holonymy, causality, part-whole relationship, coordination, or subordination, etc.

[0056] Bidirectional Encoder Representations from Transformers (BERT) classification models are pre-trained language representation models in the field of machine learning. In classification tasks, BERT models typically add one or more fully connected layers (also called classification heads) on top of their pre-trained state. These layers transform the BERT output into a class probability distribution. During the fine-tuning phase, these newly added layers are trained together with the base layers of BERT to learn how to classify input text into predefined categories.

[0057] Tokenization: In machine learning, especially in Natural Language Processing (NLP), tokenization refers to the process of dividing text data into a series of tokens. These tokens can be words, phrases, sentences, or even smaller sequences of characters. Tokenization is an important step in text preprocessing, aiming to transform continuous text strings into a format that is easier for machine learning models to process.

[0058] The softmax activation function (hereinafter referred to as the softmax function) is a commonly used activation function, often used in the output layer of multi-class classification problems. The softmax function transforms a real-valued vector into a probability distribution where each element's value is between 0 and 1, and the sum of all elements equals 1. In neural networks, the softmax function is typically used in the last layer when the model needs to output a class distribution. For example, in image recognition tasks, if the model needs to identify the class of an image from M different classes, the last layer of the network will have a vector containing M elements. After processing with the softmax function, the probability distribution for each class can be obtained, allowing the class with the highest probability to be selected as the prediction result.

[0059] Bidirectional Long Short-Term Memory (BiLSTM) is a special type of recurrent neural network (RNN) that can simultaneously capture forward and backward dependencies in sequential data.

[0060] In a standard LSTM network, information is transmitted in one direction along the time series. BiLSTM works by stacking two LSTM networks together, one processing the forward-moving time series and the other processing the reverse-moving time series. The outputs of the two LSTM networks are typically combined (e.g., by concatenation, summation, or averaging) to produce the final output, thus utilizing both the preceding and following context of the sequence.

[0061] Rewriting refers to transforming a piece of text into another form of expression while preserving the original meaning. The types of rewriting that may be involved in this disclosure include at least one of the following: sentence rewriting (e.g., grammatical rewriting), text simplification, and style conversion (e.g., transforming written expression style into spoken expression style).

[0062] Chain of thought (COT) is a method for breaking down complex problems or tasks into a series of simpler steps. This method helps models better understand and process the problem.

[0063] Figure 1 This is a flowchart illustrating an intent recognition method according to an exemplary embodiment, such as... Figure 1 As shown, the intent recognition method can be applied to a terminal or server and includes the following steps.

[0064] In step S11, the content of the first question asked in the Nth question and the content of the second question asked in the (N-1)th question are obtained.

[0065] In step S12, based on the first question content and the second question content, it is determined whether there is a semantic relationship between the first question content and the second question content, and whether the first intention and the second intention are the same.

[0066] In step S13, in response to the semantic relationship between the first question content and the second question content, and the difference between the first intent and the second intent, a target intent is generated based on the first intent and the second question content, and the target intent is used as the intent of the Nth question.

[0067] Where N is an integer greater than or equal to 1, the first intent includes, for example, the intent corresponding to the first question content, and the second intent includes, for example, the intent corresponding to the second question content.

[0068] In this embodiment, since the first question content of the Nth question and the second question content of the (N-1)th question are adjacent, determining whether a semantic relationship exists between the first and second question content can help determine whether a semantic relationship exists between adjacent question content. This allows us to decide whether to determine the target intent based on the (N-1)th question content. Furthermore, determining whether the intents of the first and second question content are the same helps determine whether the intent has changed during multi-turn dialogue, improving the accuracy of target intent recognition. This makes the intent determination result of the Nth question content more accurate, providing a better dialogue experience for the user.

[0069] It should be noted that the first question can include at least one of the following: text-based, audio-based, or image-based questions. The second question is similar to the first and will not be elaborated upon here.

[0070] In some embodiments, the first intent can be understood as an intent determined based on the acquired first question content. This intent is determined solely based on the first question content and is not associated with any context. For example, the first question content is input into a preset intent classification model, and the first intent is determined based on the output of the preset intent classification model.

[0071] In some embodiments, the target intent can be understood as an intent based on consideration of the content of historical dialogues (e.g., an intent that combines the corresponding intents of historical dialogues, or an intent that discards the historical dialogues, etc.).

[0072] It should be noted that the results of determining whether there is a semantic relationship between the content of the first question and the content of the second question, and whether the first intent and the second intent are the same (hereinafter referred to as "results"), can include, for example, several cases from A3) to D3):

[0073] A3) There is a semantic relationship between the first and second questions, and the first and second intentions are different. B3) There is a semantic relationship between the first and second questions, and the first and second intentions are the same. C3) There is no semantic relationship between the first and second questions, and the first and second intentions are the same. D3) There is no semantic relationship between the first and second questions, and the first and second intentions are different.

[0074] To facilitate understanding, the relevant examples listed in Table 1 will be used to illustrate the relevant situations of A3) to D3) above.

[0075] Table 1

[0076]

[0077] Based on the examples in Table 2 above, it can be seen that the corresponding dialogue scenario examples are different for different results. Therefore, in the process of determining the target intent, not all results need to be determined in conjunction with the content of the previous question.

[0078] For example, in C3) and D3), Q1 and Q2 may not have a semantic relationship (or may have a weak semantic relationship). Therefore, in the cases of C3) and D3), combining Q1 and Q2 may lead to inaccurate results in generating the target intent.

[0079] Therefore, in some embodiments, in the cases of C3) and / or D3), the target intent may not be determined based on the content of Q1, but rather based on the intent corresponding to Q2 (e.g., taking the intent corresponding to Q2 as the target intent, or changing the expression of the intent corresponding to Q2 and taking the changed expression as the target intent, etc.), and the target intent may be taken as the intent of the Nth question.

[0080] As another example, for B3), Q1 and Q2 may have a semantic relationship (or a strong semantic relationship), and the intent corresponding to Q1 is the same as the intent corresponding to Q2. In this case, for the target intent, the intent corresponding to Q1 (or the intent corresponding to Q2) can be used to make the target intent more closely related to the intent corresponding to the multi-turn dialogue, and can reduce the processing actions related to determining the target intent and improve the efficiency of determining the target intent.

[0081] Therefore, in some embodiments, in response to determining that there is a semantic association between the first question content and the second question content, and that the first intent and the second intent are the same, the first intent or the second intent is determined as the intent of the user's Nth question. For example, in the case of B3), the target intent can be based on the intent corresponding to the previous question content (e.g., the intent corresponding to Q1), or the intent corresponding to the current question content (e.g., the intent corresponding to Q2) can be used as the target intent, and the target intent can be used as the intent of the Nth question.

[0082] As another example, for A3), Q1 and Q2 may have a semantic relationship (or a strong semantic relationship), and the intent corresponding to Q1 is different from the intent corresponding to Q2. In this case, for the target intent, the intent corresponding to Q2 can be adapted and adjusted (e.g., rewritten) based on the content of Q1's question, thereby generating the target intent. For example, based on the content of Q1's question, the intent corresponding to Q2 can be rewritten so that the rewritten intent is easier for the model to understand. For example, the rewritten intent is "Analysis of the reasons for the increase in sales of product B in region A", and "Analysis of the reasons for the increase in sales of product B in region A" is used as the target intent.

[0083] Therefore, in some embodiments, in response to the semantic relationship between the first question content and the second question content, and the first intent being different from the second intent, a target intent is generated based on the first intent and the second question content, and the target intent is used as the intent for the Nth question.

[0084] Understandably, in multi-turn dialogue scenarios, some scenarios may involve more than two rounds of dialogue. For example, assuming the current round is the Nth question, the historical dialogue information may contain the (N-1)th question, the (N-2)th question, the (N-3)th question, and so on, up to the NMth question (where M is an integer greater than or equal to 1, and N is an integer greater than or equal to M). In this case, the confirmation of semantic relationships and intent can be achieved, for example, by combining the content of the current round's question with the content of previous questions.

[0085] To facilitate understanding, the above scenario will be described in Table 2 below using dialogue intent as an example.

[0086] Table 2

[0087]

[0088] Based on the example in Table 2 above, the number of dialogue rounds corresponding to the multi-turn dialogue is 4. Since there is no relevant question content before the (N-3)th question, the target intent cannot be determined by combining historical information for the question content of the (N-3)th round. For the (N-2)th question content, the target intent can be determined based on Q1. For the (N-1)th question content, the target intent can be determined based on the question content Q2 from the previous round (N-2). For the Nth question content, since there is no semantic relationship with the (N-1)th question content, the target intent can be determined without relying on historical information for the question content Q4 of the Nth round. For example, the description of the relevant embodiments in Table 1 above can be used to determine the target intent, which will not be repeated here.

[0089] Therefore, through the above embodiments, it is possible to determine the semantic and intent relationships between the current and previous question content in a multi-turn dialogue scenario based on the current question content and the previous question content, and to determine the target intent in different ways based on different results, so as to confirm the intent of the Nth question.

[0090] To determine the semantic relationship between the content of the first question and the content of the second question, as well as the first and second intentions, can be achieved, for example, through a corresponding model (hereinafter referred to as the target model). For ease of understanding, the following will illustrate this through... Figure 2A For the case of A3 above, an example is provided to illustrate how to use the target model.

[0091] Figure 2A This is a flowchart illustrating a method for applying a target model according to an exemplary embodiment. For example... Figure 2A As shown, the application method of the target model includes the following steps.

[0092] In step S21, the content of the (N-1)th question and answer is obtained.

[0093] In step S22, the target model is invoked to extract semantic features from the content of the first question and the content of the (N-1)th question and answer, thereby obtaining the first semantic feature corresponding to the content of the first question and the second semantic feature corresponding to the content of the (N-1)th question and answer.

[0094] In step S23, based on the first semantic feature and / or the second semantic feature, it is determined whether there is a semantic relationship between the first question content and the second question content, and whether the first intention and the second intention are the same.

[0095] The content of the N-1th question and answer session includes the content of the second question and the corresponding response.

[0096] In this embodiment, semantic features are extracted from the content of the first question and the content of the (N-1)th question and answer based on a trained target model. The extracted features are then used to determine the corresponding semantic relationships and whether the first intent and the second intent are the same. Compared to manual annotation or classification, using a trained target model improves the efficiency of determining the semantic relationships between the first and second question contents, as well as the efficiency of determining the first and second intents.

[0097] To facilitate understanding, the content of the (N-1)th question and the content of the first question in the Nth question will be explained below as examples.

[0098] Assume a multi-turn dialogue Q01-Q02, including the following:

[0099] Q01: I would like to know the sales trend of product A.

[0100] A01: Over the past period, the sales trend of product A has been upward, increasing by B%.

[0101] Q02: Now, predict the sales trend of product A in the near future.

[0102] In the above multi-round dialogue, taking Q02 as the first question in the Nth round as an example, Q01 can be understood as the question in the N-1th round of question and answer (that is, the second question), and A01 can be understood as the response to the second question.

[0103] In some embodiments, for question-and-answer content (such as Q01 and A01 in the example above), the determination of the intent of the question content can be achieved, for example, by identifying the intent solely through the relevant content of the question content (such as Q01). Alternatively, it can be determined through the question-and-answer content itself, for example, by combining the relevant features of A01 described below with the intent identification process for the relevant content of Q01 to determine the intent of Q01. It is understood that compared to identifying intent solely through the question content, determining the intent of the question content based on the question-and-answer content can more accurately improve the user experience.

[0104] Therefore, the above target model can be used to determine whether there is a semantic relationship between the content of the first question and the content of the second question, and to determine whether the first intention and the second intention are the same.

[0105] To facilitate understanding, the following will be explained through... Figure 2BThe application scenarios of the target model are illustrated. Figure 2B This is a schematic diagram illustrating a scenario of a target model application according to an exemplary embodiment.

[0106] exist Figure 2B In the process, given the obtained N-1th question-and-answer content and the first question content, the target model is invoked, and the N-1th question-and-answer content and the first question content are input into the target model. Based on the target model, the N-1th question-and-answer content, and the first question content, Task 1 and Task 2 are executed. Task 1 (contextual association recognition) can be understood as invoking the target model to determine whether a semantic relationship exists between the first and second question content. Task 2 (intent switching recognition) can be understood as invoking the target model to determine whether the first and second intents are the same (if they are the same, the intent is considered not to have switched; if they are different, the intent is considered to have switched). It is understood that after the target model executes Task 1 and Task 2, it can determine the corresponding situation based on the task execution results (e.g., at least one of the situations A3 to D3 mentioned above), and perform corresponding subsequent processing based on the results of different situations.

[0107] For training the target model, for example, it can be done through... Figure 3A The relevant embodiments shown are implemented. Figure 3A This is a flowchart illustrating a method for training a target model according to an exemplary embodiment. Figure 3A As shown, the training method for the target model includes the following steps.

[0108] In step S31, the training samples are input into the initial model.

[0109] In step S32, the initial model is invoked to classify the training samples based on the semantic features corresponding to the training samples.

[0110] In step S33, the corresponding loss function is determined based on the classification results of the training samples by the initial model.

[0111] In step S34, the initial model is adjusted based on the loss function to obtain the target model.

[0112] Each training sample includes a question-answer pair, which consists of a question and its corresponding response. Each training sample corresponds to an intent. The training samples are segmented from question-answer pairs in multi-turn dialogues. Classification of the training samples includes semantic classification and / or intent classification.

[0113] In this embodiment of the disclosure, classifying training samples includes semantic classification and / or intent classification. Therefore, by calling the initial model to perform a classification task based on the semantic features of the training samples for training, the trained target model can be equipped with the ability to perform semantic classification and / or intent classification of the training samples. Furthermore, since the training samples are question-answer pairs, by calling the initial model to perform a classification task based on the semantic features of the question-answer pairs for training, the trained target model can be equipped with the ability to perform semantic classification and / or intent classification of the question-answer pairs.

[0114] Understandably, based on the above Figure 1 , Figure 2A or Figure 2B As can be seen from the relevant embodiments, the target model needs to have the ability to perform multiple classification tasks simultaneously (e.g., two classification tasks). Therefore, in some embodiments, a BERT-type model can be selected as the initial model.

[0115] It is also understandable that, based on the above Figure 1 , Figure 2A or Figure 2B As can be seen from the relevant embodiments, the data such as the first question content and the second question content obtained may have various data formats. Therefore, in order for the subsequent model to successfully perform related processing operations on the input data, it is necessary to preprocess the input model data, such as encoding (embedding). Thus, in some embodiments, an initial encoding layer can be added to the initial model to encode the data input to the initial model.

[0116] To facilitate understanding, the following will be explained through... Figure 3B An exemplary illustration of the architecture design of the initial model is provided. Figure 3B This is a schematic diagram of the architecture of an initial model according to an exemplary embodiment.

[0117] like Figure 3B As shown, the model architecture is mainly divided into two parts: feature extraction and task extraction. The feature extraction part performs tokenization on the input content (e.g., the first question and historical question-answer content), then inputs the tokenization result into BERT for feature extraction. The resulting encoded features are then input into the task extraction part. The task extraction part can be understood as the network architecture components or other algorithmic parts within the model architecture used to perform the tasks. This can be further divided into a part for performing task 1 and a part for performing task 2.

[0118] Task 1 can be understood, for example, as a task used to perform context-related identification (see relevant explanations). Figure 2B(and related embodiments). The part of the initial model used to perform Task 1 may include, for example, a fully connected layer and an output layer. Since Task 1 can be categorized into different tasks, the activation function for the output layer can be, for example, a softmax activation function. The fully connected layer can be used to focus on label mapping for a single task (e.g., Task 1), and the output layer can be used to form a probability distribution based on the category of the output results using the softmax function.

[0119] Task 2 can be understood, for example, as a task used to perform intent switching recognition (see relevant explanations). Figure 2B (and related embodiments). The part of the initial model used to perform Task 2 may include, for example, a bidirectional long short-term memory network, a fully connected layer, and an output layer. Since Task 2 can also be categorized as a categorical task, the activation function for the output layer can be, for example, a softmax activation function. For the bidirectional long short-term memory network, contextual information (e.g., the content of the second question and the corresponding question-and-answer content, and / or the content of the first question and the (N-1)th question-and-answer content, etc.) can be modeled based on the features obtained from its forward and backward propagation. The uses of the fully connected layer and the output layer can be found in the relevant descriptions in the examples above, and will not be elaborated upon here.

[0120] Understandably, to ensure that Task 1 and Task 2 have equal importance, we can set corresponding loss functions for Task 1 and Task 2 respectively, and then add the corresponding loss functions of the two tasks together to obtain the loss function of the initial model. For ease of understanding, the loss function of the initial model will be illustrated below using the mathematical expression (1).

[0121] l joint =l c +l i (1)

[0122] Among them, l joint Let l represent the loss function corresponding to the initial model. c Let l represent the loss function corresponding to task 1. i This represents the loss function corresponding to Task 2.

[0123] Therefore, the parameters of the model can be adjusted by using the loss function corresponding to the initial model, and the target model can then be obtained.

[0124] Training samples can be obtained, for example, by manually annotating multi-turn dialogue datasets to obtain training samples suitable for model training. Alternatively, they can be determined using the method described in A4).

[0125] A4): In some embodiments, corresponding prompts are pre-constructed, and multi-turn dialogue data meeting the requirements is generated based on a preset Large Language Model (LLM). The multi-turn dialogue data includes multiple question-answer pairs (e.g., multiple question-answer pairs such as the combination of Q01 and A01, or multiple question-answer pairs such as Q01, A01, and Q02). Once the multi-turn dialogue data is obtained, it can be split into multiple training samples based on the dialogue intent. Each training sample can include a question-answer pair, and each dialogue pair in the training sample can correspond to the same dialogue intent (e.g., by performing semantic association classification on the split dialogue pairs, the split dialogue pairs are classified into multiple training samples).

[0126] Therefore, the constructed training samples are less likely to introduce redundant information, and since only question-answer pairs need to be constructed during the construction of training samples, the difficulty of constructing training samples can be reduced, thus saving the construction cost of training samples.

[0127] For ease of understanding, the relationship between the aforementioned “multi-turn dialogue data”, “training samples” and “dialogue pairs” will be described below using mathematical expressions (2) and (3).

[0128]

[0129] Among them, for the above mathematical expression (2), "Session A "" represents the training samples obtained from multi-turn dialogues generated based on LLM (e.g., task-based multi-turn dialogues to complete task A or task-based multi-turn dialogues to complete task B, etc.). Each training sample includes multiple question-answer pairs (Q, A), for example, in the above mathematical expression (3) or etc. Among them, This represents a task-based multi-turn dialogue pair 1, indicating the completion of task A. This indicates a task-based multi-turn dialogue pair 2, representing the completion of task A. This represents a task-oriented multi-turn dialogue pair n that completes task A. This represents a task-based multi-turn dialogue pair 1, indicating the completion of task B. This represents a task-oriented multi-turn dialogue pair n, where task B is to be completed. "n" represents the "Session" component. A The number of question-and-answer pairs. “Q” represents the question content in a question-and-answer pair in a multi-turn dialogue, for example, in the mathematical expression (3) above. or etc. Among them, This refers to the question content in task-based multi-turn dialogue 1, indicating the completion of task A. This refers to the question content in task-based multi-turn dialogue 2, indicating the completion of task A. The content of the questions in n in a task-type multi-turn dialogue to complete task A. This refers to the question content in task-based multi-turn dialogue 1, indicating the completion of task B. This represents the question content in a task-oriented multi-turn dialogue pair (n) for completing task B. In the question-answer pair (Q, A), "A" represents the response content corresponding to "Q". For example, in the above mathematical expression (3)... or etc. Among them, express The corresponding answer content. express The corresponding answer content. express The corresponding answer content. express The corresponding answer content. express The corresponding answer content.

[0130] For the above mathematical expression (3), where “DIALOGUE” represents a multi-turn dialogue generated based on LLM (e.g., which may be referred to as the first LLM here), and i represents the turn of the question-answer pair (e.g., time turn, etc.).

[0131] Following the aforementioned A4-related embodiments, based on the training samples constructed above, the objective of model training can be understood as: determining, given historical context, whether the current question is a training sample from the previous round (e.g., a Session). A Part of the question-and-answer pairs, or a new round of training samples (e.g., Session). B The initial question in a question-and-answer pair. Therefore, to obtain training samples from multi-turn dialogues, it is necessary to determine the relevance between contexts and whether the corresponding intents of the contexts are consistent.

[0132] To facilitate understanding, the following example from A5 will be used to illustrate how to obtain training samples by splitting multiple rounds of dialogue.

[0133] A5) Assume that the following multi-turn dialogues (Q11-Q13) are obtained based on LLM:

[0134] Q11: Hello, I would like to know about your best-selling products.

[0135] A11: Hello, thank you very much for your interest in our products. Our best-selling product is product A.

[0136] Q12: What are the reasons for it becoming the best-selling product?

[0137] A12: Because technology B used in product A is very practical.

[0138] Q13: Thank you.

[0139] For the aforementioned multi-turn dialogues Q11–Q13, the intent of the question-and-answer content based on Q11 and A11 can be determined to be "data query," the intent based on Q12 and A12 can be determined to be "causal analysis," and the intent based on the question in Q13 can be determined to be "casual conversation." Therefore, it can be seen that during a multi-turn dialogue, the intent may change due to a single question (e.g., Q11, Q12, and Q13). Furthermore, for Q12, providing an answer to A12 based on Q12 requires combining the relevant content of Q11 and A11. Therefore, in order for the model to provide accurate answers to questions, it is necessary to simultaneously consider the correlation between the current question and the question-and-answer content of previous turns. That is, by splitting the multi-turn dialogue into training samples, it is necessary to simultaneously determine the correlation between the contexts in the multi-turn dialogue and whether the contextual intent switches.

[0140] Therefore, for ease of understanding, Table 3 below will provide an example of how to construct training samples.

[0141] Table 3

[0142]

[0143] In Table 3 above, "whether there is a semantic relationship in the context" can be understood as whether there is a semantic relationship between the content of the first question and the content of the second question. "whether the intent is switched" can be understood as whether the first intent and the second intent are the same. If the first intent and the second intent are the same, then there is no switching intent. If the first intent and the second intent are different, then there is a switching intent.

[0144] It should be noted that in some scenarios, where there is no semantic connection between the contexts, it is often difficult to answer the current question based on historical context. Therefore, when constructing training texts, it is sufficient to consider only the semantic connections between contexts and whether the contextual intent has shifted.

[0145] Therefore, the initial model can be trained using the training samples constructed above, thereby obtaining the target model. Furthermore, in some scenarios, it becomes possible to determine, based on the target model, whether there is a semantic relationship between the content of the first question and the content of the second question, and whether the first and second intentions are the same, so as to determine the target intention based on the determination results.

[0146] It is understood that, based on Table 2 and the related embodiments described above, for the case of result A3), the target intent is generated based on the first intent and the second intent. The implementation method can be, for example, as follows: Figure 4A The relevant implementation examples are described below.

[0147] Figure 4A This is a flowchart illustrating a target intent generation method according to an exemplary embodiment. The target intent generation method includes the following steps.

[0148] In step S41, a preset LLM is invoked to rewrite the first question content based on the second question content, resulting in the rewritten question content.

[0149] In step S42, the intent corresponding to the rewritten question content is determined, and the intent is identified as the target intent.

[0150] In this embodiment, since the rewritten question content is a rewrite of the first question content based on the second question content, it allows the rewritten content to establish a closer relationship with the prior dialogue content. Because the target intent is determined based on the rewritten question content, it makes the target intent more closely aligned with the prior dialogue content, thereby making the determined intent for the current round of questions more accurate and improving the user's dialogue experience.

[0151] It should be noted that calling a pre-defined LLM, which can be referred to as a second LLM, involves rewriting the first question based on the second question. This can be understood as rewriting the current question into clearer (or more easily recognized by the model) text based on the preceding dialogue, thereby enabling the accurate determination of the target intent based on the rewritten text.

[0152] It is understandable that, regarding the above Figure 4B Related embodiments, for example, can be executed after the results of the target model are obtained (e.g., Figure 2B (and the "follow-up processing" section mentioned in the relevant embodiments).

[0153] To facilitate understanding of the solution, the following will be explained... Figure 4B The above-described embodiments are provided as examples. Figure 4BThis is a schematic diagram illustrating a scenario for determining a target's intent, according to an exemplary embodiment. For example... Figure 4B As shown, when the result output by the target model corresponds to A3, the second LLM can be invoked to rewrite the first question content based on the second question content (or the N-1th question-and-answer content), and determine the intent corresponding to the rewritten question content, which is then used as the target intent. The target intent is then used as the intent corresponding to the Nth question (i.e., the multi-round intent result in the figure).

[0154] It should be noted that the implementation method for determining the output result of the target model (i.e., the steps covered by the dashed box in the figure) can be found in [reference needed]. Figure 2B The relevant embodiments and descriptions are not repeated here.

[0155] The function calls a preset LLM that rewrites the first question based on the second question, for example, as shown in A6).

[0156] A6) In some embodiments, a prompt is constructed based on the second question content (e.g., including the second question content or the question-and-answer content of the N-1th time) and the first question content, and the first question content is rewritten based on the constructed prompt and the second LLM to obtain the rewritten question content.

[0157] To facilitate understanding, the following will be explained through... Figure 4C An exemplary description is provided of the relevant implementation methods for rewriting the content of the first question.

[0158] Figure 4C This is a schematic diagram illustrating a method for determining rewritten query content according to an exemplary embodiment. For example... Figure 4C As shown, upon obtaining the first and second question content, a corresponding prompt is constructed based on these content (i.e., the prompt word construction shown in the figure). The prompt construction may include, for example, the content shown in the figure: instruction construction and / or example construction. The instruction construction part is used to clarify the task to be completed, the input content and format description, and the output content and format description for the second LLM. The example construction can be implemented using techniques such as zero-shot, few-shot, or COT. The constructed prompt is then input into the second LLM to obtain the rewritten question content.

[0159] It's understandable that different users might express their questions differently. For example, different users might use different keywords in their questions, and based on these keywords, it's possible to determine the user's intended message with a certain probability.

[0160] For example, in multi-turn dialogues Q21-Q22:

[0161] Q21: What is the sales volume of product A? A21: Product A accounts for B% of total sales. Q22: What about product C?

[0162] In the above multi-round dialogue, for the question Q22 "What about product C?", the word "what" indicates that the user's current question may correspond to a query intent, and "what" may be the same as or similar to the question Q21 in the previous round.

[0163] Therefore, in some scenarios, the mapping relationship between relevant information (such as prompts) in the question content and the intent can be pre-configured based on the user's expression. This allows for the determination of the target intent corresponding to the question content by identifying relevant information (such as prompts) within the question content, once the user's input is received.

[0164] For example, in some scenarios, by pre-configuring the mapping relationship between relevant information in the question content and the pre-configured question-and-answer mode (hereinafter referred to as the preset question-and-answer mode), and then configuring the association relationship between the preset question-and-answer mode and the intent, it is possible to determine the preset question-and-answer mode corresponding to the question content based on the user's input question content, and then determine the target intent corresponding to the question content.

[0165] For example, in some scenarios, by pre-configuring the mapping relationship between relevant information in the question content and the preset question-and-answer mode, and then configuring the association relationship between the preset question-and-answer mode and the intent modification rules, it is possible to determine the preset question-and-answer mode and the intent corresponding to the question content based on the user's input question content. Then, the intent corresponding to the question content can be modified according to the modification rules corresponding to the preset question-and-answer mode to obtain the target intent.

[0166] Understandably, if the first question obtained is not within the preset question-and-answer pattern, the intent corresponding to the question cannot be determined based on the pre-configured mapping relationship. Therefore, given the first question, it is necessary to determine the question-and-answer pattern corresponding to the first question (or the Nth question), and based on the determination result, different methods should be used to determine the target intent.

[0167] To facilitate understanding, examples of cases A7) and B7) will be used below to illustrate how to determine the target intent.

[0168] A7) In some embodiments, the question-and-answer pattern of the Nth question is determined based on the content of the first question, and the question-and-answer pattern of the Nth question is determined to be a preset question-and-answer pattern.

[0169] In this case, for example, it can be done as follows Figure 5 The relevant methods mentioned help determine the target's intent. Figure 5 This is a flowchart illustrating a method for determining a target intent according to an exemplary embodiment. Figure 5 The method shown includes the following steps.

[0170] In step S51, the intent correction rule corresponding to the preset question-and-answer mode is determined.

[0171] In step S52, the first intent corresponding to the first question content is modified based on the intent modification rule, and the modified intent is taken as the target intent.

[0172] Among them, the preset question-and-answer mode corresponds to the intent correction rule, which is used to rewrite the first intent.

[0173] In this embodiment, since the first intent is determined based on the content of the first question, the expression of the first intent determined based on the content of the first question may be detrimental to the recognition by subsequent models. Therefore, modifying the first intent based on a preset intent correction rule and using the modified intent as the target intent can increase the likelihood that the target intent will be accurately recognized by subsequent models, thereby improving the user experience.

[0174] Therefore, it is possible to determine the target intent when the content of the first question corresponds to a preset question-and-answer pattern.

[0175] B7) In some embodiments, the question-and-answer pattern of the Nth question is determined based on the first question content, and the question-and-answer pattern of the Nth question is determined to be another pattern (e.g., a question-and-answer pattern different from the preset question-and-answer pattern).

[0176] In this case, for example, based on the content of the first question and the content of the second question, it can be determined whether there is a semantic relationship between the content of the first question and the content of the second question, and whether the first intent and the second intent are the same, and the target intent can be determined based on the corresponding results.

[0177] Understandably, for B7), the process of determining the target intent can be found in the aforementioned [reference needed]. Figure 1 Table 1, Table 2 Figure 2A or Figure 2B The relevant embodiments are described in detail here, and will not be repeated here.

[0178] Therefore, by using A7) and B7), the target intent in different question-and-answer modes can be determined. This, in turn, enables the determination of the Nth question intent in different modes, thereby improving the user's conversational experience.

[0179] To determine the question-and-answer pattern corresponding to the first question (or the Nth question), for example, it can be done through... Figure 6A This is achieved as shown. Figure 6A This is a flowchart illustrating a method for determining the question-and-answer pattern of the Nth question, according to an exemplary embodiment. The method for determining the question-and-answer pattern of the Nth question includes the following steps.

[0180] In step S61, the first keyword of the first question content is determined.

[0181] In step S62-1, if the first keyword is a keyword in the preset matching template, then the question-and-answer mode corresponding to the Nth question is determined to be the preset question-and-answer mode.

[0182] In step S62-2, if the first keyword is not a keyword in the preset matching template, then the question-and-answer mode corresponding to the Nth question is determined to be another question-and-answer mode.

[0183] The first keyword includes keywords that indicate tone and / or keywords that indicate intent. Other question-and-answer patterns differ from the preset question-and-answer patterns.

[0184] In this embodiment of the disclosure, by identifying whether the first question content contains a first keyword, and based on the matching template, the question-and-answer pattern corresponding to the first question content can be confirmed more efficiently.

[0185] In some embodiments, the matching template for relevant information in the question content and preset question-and-answer patterns can be set based on different ways the user expresses the question. For example, if the user sets the question content to express "So?", the corresponding question-and-answer pattern in the matching template can be set as a first preset question-and-answer pattern, which may correspond to the preset question-and-answer pattern for the intent of "giving advice". As another example, if the user sets the question content to express "Ah?", the corresponding question-and-answer pattern in the matching template can be set as a second preset question-and-answer pattern, which may correspond to the preset question-and-answer pattern for the intent of "analyzing causes".

[0186] Therefore, the above embodiments can be used to determine the question-and-answer pattern of the Nth question, and then the target intent can be determined by adopting the corresponding method based on the determined result.

[0187] To facilitate understanding, the following will be explained through... Figure 6B The overall process of the above scheme is described. Figure 6BThis is a schematic diagram of an intent recognition scenario according to an exemplary embodiment.

[0188] like Figure 6B As shown, upon obtaining the content of the first question, the intent of the first question can be identified to obtain the first intent corresponding to the first question (for example, this can be achieved based on a pre-trained intent recognition model or a related intent recognition algorithm). Furthermore, the question-and-answer pattern corresponding to the Nth question can be determined based on the first question (the relevant determination method can be found in the above-mentioned embodiments, and will not be repeated here), and it can be determined whether the question-and-answer pattern corresponding to the Nth question matches a preset question-and-answer pattern (the preset question-and-answer pattern corresponds to a preset intent correction rule).

[0189] If the result is determined to be a match, the intent is modified based on the first intent (e.g., the first intent is modified to the intent of a preset question-and-answer pattern), and the modified intent is used as the intent corresponding to the Nth question. In some scenarios, the first intent can also be modified based on the previous question-and-answer content (e.g., the content of the N-1th question-and-answer in the figure) and the intent modification rules corresponding to the matched question-and-answer pattern. This allows the modified intent to be accurately recognized by the model while also establishing a closer relationship with the context. After obtaining the modified intent, it can be used as the intent corresponding to the Nth question.

[0190] If the result is determined to be a mismatch, the content of the first question and the content of the (N-1)th question and answer are input into the target model. Based on the target model, the corresponding recognition task (e.g., task 1 and task 2 in the figure) is executed, and the result is determined according to the situation (e.g., the situations mentioned in A3 to D3). Figure 6B (Taking A3 as an example for illustration) The target intent is determined in a corresponding manner, and the target intent is identified as the intent corresponding to the Nth question. It is understood that the situations corresponding to the identification results and the corresponding methods for determining the target intent have been described in detail in the aforementioned related embodiments, and the relevant descriptions can be found in the aforementioned related descriptions, which will not be repeated here.

[0191] Therefore, through the above-described embodiments, intent recognition in multi-turn dialogue scenarios can be achieved.

[0192] Based on the same concept, embodiments of this disclosure also provide an intent recognition device.

[0193] It is understood that the intent recognition device provided in this disclosure includes hardware structures and / or software modules corresponding to each function in order to achieve the above-mentioned functions. In conjunction with the units and algorithm steps of the various examples disclosed in this disclosure, this disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the technical solutions of this disclosure.

[0194] Figure 7 This is a block diagram of an intent recognition device according to an exemplary embodiment. Figure 1 . Reference Figure 7 The device 100 includes an acquisition unit 101 and a processing unit 102.

[0195] The acquisition unit 101 is used to acquire the content of the first question asked in the Nth question and the content of the second question asked in the (N-1)th question, where N is an integer greater than 1;

[0196] The processing unit 102 is configured to determine whether there is a semantic relationship between the first question content and the second question content based on the first question content and the second question content, and to determine whether the first intention and the second intention are the same. If there is a semantic relationship between the first question content and the second question content, and the first intention and the second intention are different, the processing unit 102 is configured to generate a target intention based on the first intention and the second question content, and use the target intention as the intention of the Nth question.

[0197] Here, the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content.

[0198] In some embodiments, the processing unit 102 determines whether there is a semantic relationship between the first question content and the second question content, and determines whether the first intent and the second intent are the same, in the following manner: obtaining the content of the (N-1)th question and answer, wherein the content of the (N-1)th question and answer includes the second question content and the response content corresponding to the second question content; calling the target model to extract semantic features from the first question content and the content of the (N-1)th question and answer, obtaining the first semantic feature corresponding to the first question content and the second semantic feature corresponding to the content of the (N-1)th question and answer; based on the first semantic feature and / or the second semantic feature, determining whether there is a semantic relationship between the first question content and the second question content, and determining whether the first intent and the second intent are the same.

[0199] In some embodiments, the processing unit 102 is further configured to: determine the first intent or the second intent as the intent of the user's Nth question if it is determined that there is a semantic association between the first question content and the second question content, and the first intent and the second intent are the same.

[0200] In some embodiments, the target model is trained as follows: training samples are input into an initial model, wherein each training sample includes a question-answer pair, the question-answer pair includes a question and a corresponding response, each training sample corresponds to an intent, and the training samples are obtained based on question-answer pair segmentation from multi-turn dialogues; semantic features are extracted from the training samples to obtain the semantic features corresponding to the training samples; the initial model is invoked to classify the training samples based on the semantic features corresponding to the training samples, wherein classifying the training samples includes semantic classification and / or intent classification; based on the classification results of the training samples by the initial model, the corresponding loss function is determined; and the initial model is adjusted based on the loss function to obtain the target model.

[0201] In some embodiments, the processing unit 102 generates a target intent based on a first intent and a second question content in the following manner: it calls a preset LLM to rewrite the first question content based on the second question content to obtain the rewritten question content; it determines the intent corresponding to the rewritten question content and identifies the intent as the target intent.

[0202] In some embodiments, the processing unit 102 is further configured to: determine the question-and-answer pattern of the Nth question based on the content of the first question, and determine that the question-and-answer pattern of the Nth question is another pattern, wherein the other pattern is different from the preset question-and-answer pattern.

[0203] In some embodiments, the processing unit 102 determines the question-and-answer pattern of the Nth question based on the content of the first question in the following manner: determining the first keyword of the first question content; if the first keyword is a keyword in a preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as a preset question-and-answer pattern; if the first keyword is not a keyword in a preset matching template, then determining the question-and-answer pattern corresponding to the Nth question as another question-and-answer pattern, wherein the other question-and-answer pattern is different from the preset question-and-answer pattern.

[0204] In some embodiments, when a preset question-and-answer mode corresponds to a preset intent correction rule, the processing unit 102 determines the question-and-answer mode of the Nth question based on the content of the first question in the following manner: when the question-and-answer mode is a preset question-and-answer mode, the intent correction rule corresponding to the preset question-and-answer mode is determined; the first intent corresponding to the content of the first question is corrected based on the intent correction rule, and the corrected intent is taken as the target intent.

[0205] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0206] Figure 8 This is a block diagram two illustrating an intent recognition device according to an exemplary embodiment. The device 200 can be provided as a terminal for performing an intent recognition method. For example, the device 200 can be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.

[0207] Reference Figure 8 The device 200 may include one or more of the following components: processing component 202, memory 204, power component 206, multimedia component 208, audio component 210, input / output (I / O) interface 212, sensor component 214, and communication component 216.

[0208] Processing component 202 typically controls the overall operation of device 200, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 202 may include one or more processors 220 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 202 may include one or more modules to facilitate interaction between processing component 202 and other components. For example, processing component 202 may include a multimedia module to facilitate interaction between multimedia component 208 and processing component 202.

[0209] Memory 204 is configured to store various types of data to support the operation of device 200. Examples of such data include instructions for any application or method operating on device 200, contact data, phonebook data, messages, pictures, videos, etc. Memory 204 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0210] The power supply component 206 provides power to the various components of the device 200. The power supply component 206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to the device 200.

[0211] Multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 208 includes a front-facing camera and / or a rear-facing camera. When the device 200 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0212] Audio component 210 is configured to output and / or input audio signals. For example, audio component 210 includes a microphone (MIC) configured to receive external audio signals when device 200 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 204 or transmitted via communication component 216. In some embodiments, audio component 210 also includes a speaker for outputting audio signals.

[0213] I / O interface 212 provides an interface between processing component 202 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0214] Sensor assembly 214 includes one or more sensors for providing status assessments of various aspects of device 200. For example, sensor assembly 214 may detect the on / off state of device 200, the relative positioning of components such as the display and keypad of device 200, changes in the position of device 200 or a component of device 200, the presence or absence of user contact with device 200, the orientation or acceleration / deceleration of device 200, and temperature changes of device 200. Sensor assembly 214 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 214 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.

[0215] Communication component 216 is configured to facilitate wired or wireless communication between device 200 and other devices. Device 200 can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 216 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 216 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0216] In an exemplary embodiment, the apparatus 200 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0217] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 204 including instructions, which can be executed by a processor 220 of the device 200 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0218] Figure 9 This is block diagram three illustrating an intent recognition device according to an exemplary embodiment. For example, device 300 may be provided as a server. (Refer to...) Figure 9 The device 300 includes a processing component 322, which further includes one or more processors, and a memory resource represented by a memory 332 for storing instructions, such as application programs, that can be executed by the processing component 322. The application programs stored in the memory 332 may include one or more modules, each corresponding to a set of instructions.

[0219] Device 300 may also include a power supply component 326 configured to perform power management of device 300, a wired or wireless network interface 350 configured to connect device 300 to a network, and an input / output (I / O) interface 358. Device 300 may operate on an operating system stored in memory 332, such as Windows Server™, MacOSX™, Unix™, Linux™, FreeBSD™, or similar.

[0220] Based on the same concept, this disclosure also provides a computer program product, wherein the computer program product includes a computer program. This computer program can be executed by a processor, and when executed by the processor, it can perform any of the intent recognition methods described above.

[0221] It is understood that in this disclosure, "multiple" refers to two or more, and other quantifiers are similar. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, and B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. The singular forms "a," "the," and "the" are also intended to include the plural forms unless the context clearly indicates otherwise.

[0222] It is further understood that the terms "first," "second," etc., are used to describe various types of information, but this information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another, and do not indicate a specific order or degree of importance. In fact, the expressions "first," "second," etc., are completely interchangeable. For example, without departing from the scope of this disclosure, first information can also be referred to as second information, and similarly, second information can also be referred to as first information.

[0223] It is further understood that the terms “center,” “longitudinal,” “lateral,” “front,” “rear,” “up,” “down,” “left,” “right,” “vertical,” “horizontal,” “top,” “bottom,” “inner,” and “outer,” etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this embodiment and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation.

[0224] It can be further understood that, unless otherwise specified, "connection" includes both direct connections where no other components exist between the two parties and indirect connections where other components exist between them.

[0225] It is further understood that although operations are described in a specific order in the accompanying drawings in the embodiments of this disclosure, this should not be construed as requiring these operations to be performed in the specific order or serial order shown, or requiring all of the shown operations to be performed to obtain the desired result. In certain environments, multitasking and parallel processing may be advantageous.

[0226] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein.

[0227] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. An intent recognition method, characterized in that, include: Obtain the content of the first question asked in the Nth question and the content of the second question asked in the (N-1)th question, where N is an integer greater than 1; Based on the first question content and the second question content, determine whether there is a semantic relationship between the first question content and the second question content, and determine whether the first intent and the second intent are the same, wherein the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content; In response to the semantic relationship between the first question content and the second question content, and the fact that the first intent is different from the second intent, a target intent is generated based on the first intent and the second question content, and the target intent is used as the intent for the Nth question.

2. The method according to claim 1, characterized in that, Determining whether there is a semantic relationship between the first question content and the second question content, and determining whether the first intent and the second intent are the same, includes: Obtain the content of the (N-1)th question and answer, wherein the content of the (N-1)th question and answer includes the content of the second question and the corresponding response content; The target model is invoked to extract semantic features from the content of the first question and the content of the (N-1)th question and answer, thereby obtaining the first semantic feature corresponding to the content of the first question and the second semantic feature corresponding to the content of the (N-1)th question and answer. Based on the first semantic feature and / or the second semantic feature, determine whether there is a semantic relationship between the first question content and the second question content, and determine whether the first intent and the second intent are the same.

3. The method according to claim 1, characterized in that, The method further includes: In response to determining that there is a semantic relationship between the first question content and the second question content, and that the first intent and the second intent are the same, the first intent or the second intent is determined as the intent of the user's Nth question.

4. The method according to claim 2, characterized in that, The target model is trained in the following manner: The training samples are input into the initial model, wherein each training sample includes a question-answer pair, the question-answer pair includes the question content and the corresponding response content, each training sample corresponds to an intent, and the training samples are obtained based on the segmentation of question-answer pairs from multi-turn dialogues; Semantic features are extracted from the training samples to obtain the semantic features corresponding to the training samples; The initial model is invoked to classify the training samples based on the semantic features corresponding to the training samples, wherein classifying the training samples includes semantic classification and / or intent classification of the training samples; Based on the classification results of the training samples by the initial model, the corresponding loss function is determined; The initial model is adjusted based on the loss function to obtain the target model.

5. The method according to claim 1, characterized in that, The generation of the target intent based on the first intent and the second question content includes: The preset Large Language Model (LLM) is invoked to rewrite the first question content based on the second question content, resulting in the rewritten question content. Determine the intent corresponding to the rewritten question content, and define the intent as the target intent.

6. The method according to claim 1, characterized in that, After obtaining the content of the first question of the Nth question and the content of the second question of the (N-1)th question input by the user, the method further includes: Based on the content of the first question, the question-and-answer pattern of the Nth question is determined, and the question-and-answer pattern of the Nth question is determined to be another pattern, wherein the other question-and-answer pattern is different from the preset question-and-answer pattern.

7. The method according to claim 6, characterized in that, The step of determining the question-and-answer pattern for the Nth question based on the content of the first question includes: Determine the first keyword of the first question content; If the first keyword is a keyword in the preset matching template, then the question-and-answer pattern corresponding to the Nth question is determined to be the preset question-and-answer pattern; If the first keyword is not a keyword in the preset matching template, then the question-and-answer mode corresponding to the Nth question is determined to be another question-and-answer mode, wherein the other question-and-answer mode is different from the preset question-and-answer mode.

8. The method according to claim 6 or 7, characterized in that, The preset question-and-answer pattern corresponds to the preset intent correction rule. Determining the question-and-answer pattern for the Nth question based on the content of the first question includes: If the question-and-answer pattern is the preset question-and-answer pattern, then the intent correction rule corresponding to the preset question-and-answer pattern is determined; Based on the intent correction rule, the first intent corresponding to the first question content is corrected, and the corrected intent is taken as the target intent.

9. An intent recognition device, characterized in that, include: The acquisition unit is used to acquire the content of the first question asked in the Nth question and the content of the second question asked in the (N-1)th question, wherein N is an integer greater than 1; The processing unit is configured to determine, based on the first question content and the second question content, whether there is a semantic relationship between the first question content and the second question content, and whether the first intent and the second intent are the same. If there is a semantic relationship between the first question content and the second question content, and the first intent and the second intent are different, the unit generates a target intent based on the first intent and the second question content, and uses the target intent as the intent for the Nth question. Wherein, the first intent is the intent corresponding to the first question content, and the second intent is the intent corresponding to the second question content.

10. An electronic device, characterized in that, include: processor: Memory used to store processor-executable instructions; The processor is configured to execute the intent recognition method according to any one of claims 1 to 8.

11. A storage medium, characterized in that, The storage medium stores instructions that, when executed by a processor, enable the processor to perform the intent recognition method according to any one of claims 1 to 8.

12. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the intent recognition method as described in any one of claims 1 to 8.