Intention recognition method, device and storage medium
By receiving user input and dialogue history in the intent recognition system, determining whether semantic completion is needed, and using the first model rewriting and visual orchestration interface to dynamically configure the model for collaborative decision-making, triggering a hierarchical follow-up questioning strategy, the system solves the adaptation problem of multi-model fusion solutions under rapid iteration of business scenarios and personalized needs, and achieves efficient and accurate intent recognition and multi-turn dialogue management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA MERCHANTS BANK
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multi-model fusion solutions cannot flexibly adapt to the rapid iteration of business scenarios and personalized needs. This leads to a sharp increase in the customization costs of model switching, parameter tuning, and scenario adaptation, which seriously reduces the efficiency of integration and operation and maintenance. It is difficult to cover the complex requirements of all scenarios and forms the core bottleneck of the intent recognition center.
By receiving user input and dialogue history, it determines whether semantic completion is needed, rewrites the user input using the first model, and dynamically configures the intent recognition fusion model through a visual orchestration interface for collaborative decision-making. When the intent is ambiguous, it triggers a hierarchical follow-up questioning strategy, generates follow-up questioning information, and updates the dialogue history based on user feedback to achieve multi-round interaction.
We have built an intent recognition hub system that can dynamically adapt to multiple business scenarios and accurately understand complex dialogue intents. This system improves the naturalness, accuracy, and maintainability of human-computer interaction, achieves low-cost adaptation and flexible scheduling, and ensures the contextual coherence and semantic accuracy of multi-turn dialogues.
Smart Images

Figure CN122240784A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of interactive query technology, and in particular to an intent recognition method, device and storage medium. Background Technology
[0002] With the rapid development of artificial intelligence technology, intent recognition, as a core component of natural language processing, has become a crucial technological foundation for building intelligent dialogue systems and achieving natural human-computer interaction. In numerous vertical business sectors such as finance, customer service, and marketing, the intent recognition hub needs to simultaneously serve multiple business scenarios, including customer profiling, product consultation, and business processing. Different scenarios place varying demands on recognition accuracy, speed, and intent granularity. To address this complexity, the industry commonly employs a multi-model collaborative approach, integrating the strengths of rule-based models, small models, and large models to achieve a balance between accuracy, efficiency, and generalization ability, collectively completing intent decision-making.
[0003] However, most existing multi-model fusion solutions adopt fixed, hard-coded collaborative frameworks, where the scheduling logic, weight allocation, and execution order between different models are fixed after system deployment. This "static" fusion mechanism exposes significant limitations when facing rapid iterations and personalized needs in business scenarios: when new businesses are integrated or existing businesses adjust their recognition requirements, the system cannot adapt flexibly, leading to a sharp increase in the customization costs of model switching, parameter tuning, and scenario adaptation. This not only severely reduces the efficiency of integration and maintenance but also makes it difficult for a standardized solution to cover the complex requirements of all scenarios, forming a core bottleneck for the intent recognition hub in large-scale applications.
[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention
[0005] The main purpose of this application is to provide an intent recognition method, device and storage medium, which aims to solve the technical problem of how to realize intent recognition in complex dialogue interactions in multi-service scenarios.
[0006] To achieve the above objectives, this application proposes an intent recognition method, the method comprising:
[0007] Receive user input for the current round; Based on the user input and the context information retrieved from the dialogue history, determine whether semantic completion of the user input is necessary; When semantic completion of the user input is required, the user input is rewritten using the first model to generate the completed user input; The intent recognition fusion model, which is dynamically configured through a visual orchestration interface, performs collaborative decision-making on the completed user input or the user input before rewriting, and obtains the initial intent recognition result. When the initial intent recognition result indicates that the user's intent is ambiguous, a hierarchical follow-up questioning strategy is triggered to generate follow-up questioning information and output it to the user; Receive user feedback input in the next round and update the dialogue history based on the feedback input to complete multiple rounds of interaction.
[0008] In one embodiment, the step of determining whether semantic completion of the user input is needed includes: The user input is subjected to semantic integrity detection to obtain an integrity score; Semantic coherence detection is performed on the user input and the previous round of user input in the dialogue history to obtain a coherence score; The integrity score, the coherence score, and the preset custom rules are taken as inputs and fed into the low-loss judge. The low-loss judge outputs a Boolean judgment result indicating whether semantic completion is required.
[0009] In one embodiment, the step of rewriting the user input using a first model to generate the completed user input includes: When the Boolean judgment result is "yes", the first model is triggered; Key elements are extracted based on the dialogue history, and business domain knowledge is obtained based on the business knowledge base. The user input, the dialogue history, the key elements, and the business domain knowledge are concatenated into dynamic prompt words, which are then input into the large language model that serves as the first model. Receive the completed user input output by the large language model; The completed user input is input into the validator to verify its rationality and authenticity. If the validation passes, the completed user input is output; if the validation fails, the completed user input is discarded, and the original user input is output.
[0010] In one embodiment, the step of triggering the hierarchical follow-up questioning strategy, generating follow-up questioning information, and outputting it to the user includes: The initial intent recognition result is input into the fuzzy intent detector to determine the fuzziness type of the initial intent recognition result; If the ambiguity type is intent ambiguity, then the intent clarification sub-process is triggered to generate follow-up questions to confirm or guide intent. If the fuzziness type is slot fuzziness, the initial intent recognition result is input to the slot analyzer, and the slot analyzer outputs a list of missing slots or a list of slots with low confidence. The missing slot list or the slot list with low confidence is input into the dialogue generator. The dialogue generator calls the generative model based on the large language model to dynamically generate natural language follow-up dialogue for the slot.
[0011] In one embodiment, the step of dynamically generating natural language follow-up questions for the slot includes: Obtain a pre-generated slot clarification template, which defines the structured relationship between word slots, interface slots, and slot groups; The slot clarification template and the list of slots to be clarified are input into the generative model based on the large language model. The natural language follow-up questions output by the generation model that match the slot to be clarified are received as follow-up information.
[0012] In one embodiment, the method further includes iteratively optimizing the intent recognition fusion model, specifically including: Collect low-confidence data from online interaction logs as a dataset for reflection; Each piece of data in the dataset to be reflected upon is input into multiple different large language model experts; Receive the intent recognition results output by each of the large language model experts, vote on multiple intent recognition results, and take the intent with the most votes as the labeled answer for the corresponding data to generate a high-confidence reflection dataset; The reflection dataset is automatically fed into the model training set, triggering the model to train automatically on a regular basis. The updated model parameters are automatically deployed to the production environment, replacing the corresponding model in the original intent recognition fusion model.
[0013] In one embodiment, the step of using a dynamically configured intent recognition fusion model through a visual orchestration interface to make collaborative decisions on the completed user input or the unrewritten user input to obtain an initial intent recognition result includes: The reflection dataset is input into the rule mining tool, and the large language model is invoked based on the rule mining tool to automatically extract high-frequency similar questions from the reflection dataset. The high-frequency similarity questions are converted into expert rules and stored in the expert rule base; The rules in the expert rule base are called first for fast matching. When a match is found, the intent recognition result is directly output in the early stage of intent recognition.
[0014] In one embodiment, the step of dynamically configuring the intent recognition fusion model through a visual orchestration interface includes: A list of multiple available intent recognition models is displayed through a visual orchestration interface; Receive user drag-and-drop and connection operations on the multiple intent recognition models on the visual orchestration interface, and configure the execution order of the multiple intent recognition models; Receive the recall threshold, weight parameters, and execution priority set by the user for each intent recognition model on the visual orchestration interface; The execution order, recall threshold, weight parameters, and execution priority are encapsulated into a configuration file as the runtime configuration of the intent recognition fusion model; wherein the intent recognition model includes at least two of the following: an expert rule-based model, a deep learning-based intent classification mini-model, and an intent recognition model based on a large language model.
[0015] In addition, to achieve the above objectives, this application also proposes an intent recognition device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the intent recognition method as described above.
[0016] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and which, when executed by a processor, implements the steps of the intent recognition method described above.
[0017] One or more technical solutions proposed in this application have at least the following technical effects: This application receives user input in the current round; based on the user input and context information retrieved from the dialogue history, it determines whether semantic completion of the user input is needed; when semantic completion is needed, the user input is rewritten using a first model to generate completed user input; an intent recognition fusion model dynamically configured through a visual orchestration interface is used to make collaborative decisions on either the completed or unrewritten user input to obtain an initial intent recognition result; when the initial intent recognition result indicates that the user's intent is ambiguous, a hierarchical follow-up questioning strategy is triggered to generate follow-up questioning information and output it to the user; feedback input from the user in the next round is received, and the dialogue history is updated based on the feedback input to complete multi-round interaction. This application achieves low-cost adaptation and flexible scheduling across multiple business scenarios through a dynamically configurable intent recognition fusion model via a visual orchestration interface; it determines whether semantic completion is needed based on user input and dialogue history, and rewrites the first model to achieve contextual coherence and semantic accuracy in multi-turn dialogues; it achieves human-like questioning and efficient slot filling under ambiguous intents by triggering a layered follow-up questioning strategy; and it enables the model's self-evolution, constructing an intent recognition central system that can dynamically adapt to multiple business scenarios, accurately understand complex dialogue intents, and continuously self-optimize, significantly improving the naturalness, accuracy, and maintainability of human-computer interaction. Attached Figure Description
[0018] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0019] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 This is a flowchart illustrating an embodiment of the intent recognition method of this application. Figure 2 This is a flowchart illustrating Embodiment 2 of the intent recognition method of this application; Figure 3 This is a flowchart illustrating Embodiment 3 of the intent recognition method of this application; Figure 4 This is a flowchart illustrating Embodiment 4 of the intent recognition method of this application; Figure 5 This is a flowchart illustrating Embodiment 5 of the intent recognition method of this application; Figure 6This is a schematic diagram of the device structure of the hardware operating environment involved in the intent recognition method in the embodiments of this application.
[0021] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0022] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.
[0023] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.
[0024] Because existing technologies cannot flexibly adapt to new business access or existing business adjustment recognition requirements, the customized costs of model switching, parameter tuning, and scenario adaptation have risen sharply. This not only seriously reduces the efficiency of integration and operation and maintenance, but also makes it difficult for a standardized solution to cover the complex requirements of all scenarios, forming the core bottleneck of intent recognition center in large-scale applications.
[0025] This application provides a solution that achieves low-cost adaptation and flexible scheduling across multiple business scenarios through a dynamically configurable intent recognition fusion model via a visual orchestration interface; it determines whether semantic completion is needed based on user input and dialogue history, and rewrites the first model to achieve contextual coherence and semantic accuracy in multi-turn dialogues; it achieves human-like questioning and efficient slot filling under ambiguous intents by triggering a hierarchical questioning strategy; and it enables the model to self-evolve, constructing an intent recognition central system that can dynamically adapt to multiple business scenarios, accurately understand complex dialogue intents, and continuously self-optimize, significantly improving the naturalness, accuracy, and maintainability of human-computer interaction.
[0026] Based on this, embodiments of this application provide an intent recognition method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the intent recognition method of this application.
[0027] In this embodiment, the intent recognition method includes steps S10 to S60: Step S10: Receive user input for the current round; It should be noted that in this embodiment, user input refers to queries, commands, or questions entered by the user in the form of voice or text during the interaction between the user and the intent recognition system. The current round refers to the latest round of interaction relative to previous rounds in a multi-round dialogue. The specific form of user input can be natural language text or text data obtained by converting the user's speech through a speech recognition module. The intent recognition system continuously listens for or waits for user input. When it detects that the user has initiated a new round of dialogue, the system acquires the complete input content of that round as the basis for subsequent semantic understanding and intent recognition processing.
[0028] Optionally, user input can be acquired through various methods such as application programming interfaces (APIs), front-end interactive interfaces, or message queues. After receiving user input, the system temporarily stores it in a memory buffer and associates it with a unique identifier and round number of the current dialogue for context tracing in subsequent steps. For example, in one specific implementation, when a bank's customer service robot interacts with a user, the user enters "I want to check my credit card statement" through a mobile banking client. The system receives this text input and records it as the user input for the current round, while also recording the current dialogue session ID and round number.
[0029] Step S20: Based on the user input and the context information retrieved from the dialogue history, determine whether semantic completion of the user input is required; It should be noted that, in this embodiment, dialogue history refers to the collection of all historical interaction records that have occurred before the current round in the current dialogue session, including user input from previous rounds, system-output responses, identified intent information, and filled slot information. Contextual information refers to key information fragments extracted from the dialogue history that are related to the semantic understanding of the current user input, such as entity names mentioned by the user in the previous round, incomplete slot information, or a list of intent candidates to be confirmed. Semantic completion refers to the process by which the system restores the user input to a complete semantic expression by referencing contextual information when the user omits some information, uses pronouns, or makes incomplete statements in the current round of input. The criteria for determining whether semantic completion is needed include: whether there are referential words in the user input, whether there are obvious semantic jumps, and whether there is a logical connection with the content of the historical dialogue. In this embodiment, the system first performs basic natural language analysis on the current user input to identify pronouns, omitted components, and semantic breakpoints; at the same time, it retrieves previous interaction records related to the current session from the dialogue history memory; then, it compares the semantic relevance of the user input with the retrieved context information and calculates the semantic integrity score and coherence score; finally, it outputs a decision result on whether semantic completion is needed based on preset judgment rules or trained judgment models.
[0030] In one possible implementation, the system can employ a lightweight rule engine combined with a machine learning classifier to perform semantic completion judgment. The rule engine is responsible for handling high-frequency, explicit completion scenarios, while the machine learning classifier is responsible for handling complex, ambiguous semantic association scenarios. For example, in a specific implementation, if a user inputs "What is my credit card limit?" in the first round, the system replies "Your credit card limit is 50,000 yuan." If the user inputs "What about the cash withdrawal limit?" in the second round, after receiving this input, the system, based on the contextual information "credit card" in the dialogue history, determines that "What about the cash withdrawal limit?" omits the key qualifier "credit card" and needs semantic completion to "What is the credit card cash withdrawal limit?"
[0031] Step S30: When semantic completion of the user input is required, the user input is rewritten using the first model to generate the completed user input. It should be noted that the first model refers to a deep learning model with natural language understanding and generation capabilities. Specifically, it can be a large language model based on the Transformer architecture, or a sequence-to-sequence model specifically trained for semantic completion tasks. Rewriting refers to the first model supplementing and clarifying omitted, pronoun, or ambiguous parts of the user input based on the current user input and contextual information, through semantic understanding and generation mechanisms, to form a semantically complete text. The completed user input refers to the standardized text obtained after rewriting, which is semantically complete and independent of the context. In this embodiment, when the judgment result of step S20 indicates that semantic completion is required, the system triggers the call flow of the first model. Specifically, the system concatenates the current user input, contextual information extracted from the dialogue history, relevant terminology definitions in the business domain knowledge base, and a preset rewriting instruction template into a prompt word for the model input according to a predefined format; the prompt word is input into the first model, which generates the rewritten text based on its pre-trained knowledge and contextual understanding capabilities; the system receives the text output by the first model as the completed user input and passes it to the subsequent intent recognition step.
[0032] In one possible implementation, the system can also perform a secondary verification of the rewritten result output by the first model. For example, it can use a rule verification module to check whether the rewritten text introduces factual errors, retains the core intent of the original input, or contains expressions beyond the scope of business. If the verification passes, the rewritten result is adopted; if the verification fails, the user input before rewriting is used as an alternative. For example, in a specific implementation, the user inputs "What about its expiration date?", with the context being "credit card" mentioned in the previous round. The first model receives the prompt "User input: What about its expiration date?; Context: credit card; Please complete the question." The model outputs "How long is the credit card valid for?", and the system receives this output as the completed user input.
[0033] Step S40: Through the intent recognition fusion model dynamically configured by the visual orchestration interface, collaborative decision-making is performed on the completed user input or the user input before rewriting to obtain the initial intent recognition result. It should be noted that the visual orchestration interface refers to a graphical user interface provided to business personnel or system maintenance personnel. This interface supports dynamic orchestration and real-time adjustment of the calling order, weight allocation, and threshold settings of multiple intent recognition models through visual operations such as drag-and-drop, connecting lines, and parameter configuration. An intent recognition fusion model refers to a composite model system composed of at least two different types of intent recognition models, specifically including different types such as expert rule-based intent recognition models, deep learning-based intent classification mini-models, and zero-shot intent recognition models based on large language models. Dynamic configuration means that the composition structure of the intent recognition fusion model, the collaborative relationships between models, and the operating parameters of each model can be modified and applied in real time through the visual orchestration interface without modifying the underlying code or restarting the system service. Collaborative decision-making means that multiple intent recognition models independently recognize the input text, and the system comprehensively evaluates and votes on the output results of each model according to a preset fusion strategy, ultimately forming a unified intent recognition conclusion. The initial intent recognition result refers to the intent label, confidence score, and intermediate information generated during the recognition process output after multi-model collaborative decision-making.
[0034] Optionally, the system first obtains the completed user input from the output of step S30. If step S30 is not triggered, the user input before rewriting is used. Then, based on the currently active configuration file in the visual orchestration interface, the system determines the list of models to be called, the calling order, the weight coefficients of each model, and the decision fusion rules for this intent recognition task. The system sequentially calls each model to process the input text and collects the intent recognition results output by each model. The system performs a comprehensive analysis of the collected results according to the configured fusion strategy, such as using weighted voting, confidence ranking, multi-level verification, etc., to determine the final intent label as the initial intent recognition result.
[0035] Step S50: When the initial intent recognition result indicates that the user's intent is ambiguous, a hierarchical follow-up questioning strategy is triggered to generate follow-up questioning information and output it to the user; It should be noted that ambiguous user intent refers to one or more of the following situations in the initial intent recognition result: the confidence level of the identified intent is lower than a preset threshold; the identified intent corresponds to multiple candidate intents with similar confidence levels that are indistinguishable; the identified intent is clear, but the slot information required to complete the intent is missing or has low confidence; or the user input contains multiple possible intent directions. The hierarchical follow-up questioning strategy refers to dividing the follow-up questioning process into intent clarification and slot clarification levels based on different types of ambiguity, with different follow-up questioning logic and script generation methods configured within each level. The intent clarification level refers to the follow-up questioning method that provides the user with a list of candidate intents for confirmation or guides the user to rephrase their intent when the system cannot determine the user's specific intent. The slot clarification level refers to the follow-up questioning method that asks the user for missing slot information when the intent is clear but key information is lacking. Follow-up information refers to a piece of natural language text generated by the system to ask the user questions to obtain more information.
[0036] Optionally, the system first analyzes the initial intent recognition results to determine if there is intent ambiguity. If so, it further analyzes whether the ambiguity belongs to the intent level or the slot level. If it is intent-level ambiguity, the system initiates an intent clarification process, such as selecting several of the most likely intents from the candidate intent list and generating follow-up questions to guide user confirmation. If it is slot-level ambiguity, the system initiates a slot clarification process, such as obtaining a list of currently missing necessary slots from the slot definitions corresponding to the intent and generating specific questions for each missing slot. The system outputs the generated follow-up information to the user through the interactive interface or speech synthesis module, awaiting the user's next round of feedback input.
[0037] In one possible implementation, the system employs a dynamic dialogue generator based on a large language model. Based on the current dialogue state, user profile information, and business context, it generates personalized, business-compliant natural language expressions for each follow-up question, avoiding the use of generic, fixed dialogue templates. For example, in a specific implementation, if a user inputs "I want to check my card," after multi-model collaborative decision-making, the system determines that the confidence scores for "query credit card information" and "query debit card information" are both 0.5, indicating ambiguity at the intent level. The system then triggers the intent clarification process in its hierarchical follow-up questioning strategy, generating the follow-up question "Do you want to check your credit card or debit card?" and pushing it to the user via the mobile banking client.
[0038] Step S60: Receive user feedback input in the next round, and update the dialogue history based on the feedback input to complete the multi-round interaction.
[0039] It should be noted that "next round" refers to the subsequent interaction round relative to the current processing round. Feedback input refers to the user's response or supplementary explanation after receiving follow-up questions from the system. Updating the dialogue history means storing the user's input from the current round, the system's follow-up questions, and the feedback input received in the next round in the dialogue history memory, in chronological order and logically related, as the contextual information basis for subsequent interaction rounds. Completing multiple rounds of interaction means that through multiple rounds of information exchange, the system can ultimately understand the user's intent and obtain all necessary information, thus entering the task execution phase or providing a final response.
[0040] Optionally, after outputting follow-up questions, the system enters a waiting state, continuously listening for user input in the next round. When user feedback is received, the system uses this feedback as the new input for the current round and repeats steps S10 to S60. Simultaneously, the system encapsulates the user input from the previous round, the follow-up questions output by the system, and the feedback received in the current round, adding a timestamp and round identifier, and stores it in the dialogue history database. During storage, the system performs structured processing on the dialogue history, extracting key information points such as confirmed intents, filled slot values, and unconfirmed ambiguities to form a structured dialogue state representation, facilitating rapid retrieval and utilization in subsequent rounds.
[0041] In one possible implementation, the system can also dynamically adjust the strategy for subsequent interactions based on accumulated dialogue history information. For example, when it detects that a user has repeatedly expressed unclear information about a certain slot, it can proactively provide examples or options for the user to choose from. For instance, if the user responds to the follow-up question in step S50, "Do you want to inquire about a credit card or a debit card?", by entering "credit card," the system receives this response and stores it in the dialogue history. The updated dialogue history includes: the first round of user input "I want to check the card," the first round of system follow-up question "Do you want to inquire about a credit card or a debit card?", and the second round of user input "credit card." Based on the updated dialogue history, the system continues processing. The intent is now clearly defined as "inquiring about credit card information," and the next step will be to proceed to the slot clarification stage to inquire about the specific inquiry content.
[0042] This application determines whether semantic completion is needed and rewrites the model using a first model. Then, it uses a dynamically configured intent recognition fusion model through a visual orchestration interface for collaborative decision-making. When the intent is ambiguous, a layered follow-up questioning strategy is triggered, and the dialogue history is updated based on user feedback, achieving accurate intent recognition in complex multi-turn interaction scenarios. Specifically, a context-aware semantic completion mechanism ensures the coherence and accuracy of multi-turn dialogues; a dynamic model fusion strategy through visual orchestration enables flexible adaptation to multiple business scenarios and low-cost operation and maintenance; and layered follow-up questioning technology improves interaction efficiency and user experience under ambiguous intents. Ultimately, this constructs an intelligent intent recognition system that is flexibly configurable, accurately understands, and continuously optimizes.
[0043] Furthermore, referring to Figure 2 The second embodiment of the intent recognition method of this application provides a flowchart, based on the above. Figure 2 The embodiment shown further refines the step of "determining whether semantic completion of the user input is needed" in step S20, including steps A201 to A203: Step A201: Perform semantic integrity detection on the user input to obtain an integrity score; It's important to note that semantic completeness testing assesses whether user input in the current iteration contains the essential components necessary to independently express a complete semantic meaning, including core verbs, key nouns, and necessary modifiers. Completeness score, on the other hand, is a quantitative measure of the semantic completeness of user input; a higher score indicates that the user input is closer to a complete sentence that can be understood independently without context.
[0044] Optionally, the system first performs basic natural language processing operations such as word segmentation, part-of-speech tagging, and syntactic analysis on the user input to extract the core components of the sentence. Then, the system compares these core components with a preset semantic integrity template or an integrity assessment model trained on a corpus to calculate the completeness of the user input in terms of grammatical structure and semantic expression. Finally, the system outputs a numerical value within a preset range as an integrity score, such as a continuous value between 0 and 1, where 0 indicates extremely incomplete semantics and 1 indicates completely independent and complete semantics. In one possible implementation, the system can use a rule-based detection method, that is, predefine the set of core slots required for various intent sentences and detect whether the user input covers these core slots. In another possible implementation, the system can use a deep learning-based detection method, that is, train a binary or multi-class classification model, take the vector representation of the user input as input, and output the probability distribution of its integrity category as the integrity score.
[0045] Step A202: Perform semantic coherence detection on the user input and the previous round of user input in the dialogue history to obtain a coherence score; It should be noted that semantic coherence detection refers to assessing whether there is a semantic connection, referential relationship, or logical association between the current round of user input and the previous round of user input. The coherence score is a numerical score that quantifies the degree of semantic association between the current input and historical input. The higher the score, the stronger the connection between the two in terms of topic content, and the more likely the current input is to be a continuation or supplement to the topic of the previous round.
[0046] Optionally, the system first obtains the user input from the current round and the user input from the previous round; then, the system performs semantic representation on the two texts, which can be done by converting the text into vector form using TF-IDF vectors or sentence vectors based on a pre-trained language model; next, the system calculates the similarity between the two vectors, such as cosine similarity, Euclidean distance, or dot product similarity, and maps the similarity value to a preset scoring range as a coherence score; finally, the system also performs specific detection on the referential relationships in the two texts, such as determining whether the current input contains referential words such as "it," "this," and "that," and analyzing whether these referential words have a clear referent in the previous round of input. If so, the coherence score is further increased.
[0047] Step A203: The integrity score, the coherence score, and the preset custom rules are input to the low-loss judge, and the low-loss judge outputs a Boolean judgment result indicating whether semantic completion is required.
[0048] It should be noted that the preset custom rules refer to the logical conditions pre-configured based on the characteristics of the business scenario to assist in determining the necessity of semantic completion. Examples include fixed sentence structures that require completion in specific business scenarios, rules that trigger mandatory completion based on specific keywords, and differences in completion strategies applicable to specific user types. The low-loss judge is a lightweight decision module designed with low computational overhead, capable of quickly integrating multiple input signals and outputting clear decision results. Boolean judgment results refer to binary outputs with values of "yes" or "no," indicating the final determination that the current user input "needs semantic completion" or "does not need semantic completion."
[0049] Optionally, the low-loss judge first receives the output completeness score and coherence score, and loads a pre-configured set of custom rules. Then, the judge processes the input information according to a preset decision logic. For example, when the completeness score is below a first threshold and the coherence score is above a second threshold, it is determined that semantic completion is needed, meaning the user input itself is incomplete but highly dependent on the context, and completion can restore complete semantics. When the completeness score is below the first threshold and the coherence score is also below the second threshold, it is determined that semantic completion is not needed, meaning the user input is neither complete nor relevant to the preceding text, possibly indicating a topic switch or input error, and completion might introduce incorrect information. When there are matching items for forced completion or forced non-completion in the custom rules, the judge prioritizes the judgment result of the custom rules. In addition, the low-loss judge aims to reduce the consumption of computing resources and can be implemented using lightweight algorithms such as decision trees or rule engines to ensure that the judgment is completed in milliseconds.
[0050] This application generates completeness and coherence scores through semantic completeness and semantic coherence detection, respectively. These scores, along with preset custom rules, are input into a low-loss judge, which outputs a Boolean judgment result indicating whether semantic completion is needed. Specifically, completeness detection ensures a quantitative assessment of the sufficiency of user input, coherence detection accurately captures the relationship between user input and historical topics, and custom rules incorporate specific requirements of business scenarios. Finally, the low-loss judge makes a comprehensive decision based on multi-dimensional information with minimal computational overhead, achieving rapid and accurate judgment of the necessity for semantic completion.
[0051] In one possible implementation, the step of rewriting the user input using a first model to generate the completed user input includes: When the Boolean judgment result is "yes", the first model is triggered; Key elements are extracted based on the dialogue history, and business domain knowledge is obtained based on the business knowledge base. The user input, the dialogue history, the key elements, and the business domain knowledge are concatenated into dynamic prompt words, which are then input into the large language model that serves as the first model. Receive the completed user input output by the large language model; The completed user input is input into the validator to verify its rationality and authenticity. If the validation passes, the completed user input is output; if the validation fails, the completed user input is discarded, and the original user input is output.
[0052] It should be noted that a Boolean judgment result of "yes" refers to the low-loss judge's output indicating that semantic completion is required. The triggering of the first model is strictly controlled to occur only when semantic completion is genuinely needed. This on-demand triggering mechanism avoids the first model being indiscriminately invoked in every round of dialogue, effectively reducing system computational resource consumption and processing latency. Key elements refer to information units identified from the dialogue history that have significant reference value for current semantic completion. These include confirmed user intent, filled slot values, entity names mentioned in previous rounds, and user identity feature tags, helping the first model accurately understand the current dialogue context and the direction of information to be supplemented. Business domain knowledge consists of professional knowledge fragments retrieved from the business knowledge base that are relevant to the current dialogue scenario, ensuring that the first model's output conforms to business specifications and factual basis. Dynamic prompts are structured input texts assembled in real-time based on the current dialogue state, used to guide the large language model in generating specific outputs. Reasonableness verification assesses whether the completed user input is semantically fluent, conforms to natural language habits, and is consistent with the core intent of the original user input. Authenticity verification assesses whether the factual information contained in the completed user input is accurate, conforms to the definitions in the business knowledge base, and contains any statements that contradict business facts.
[0053] Optionally, the first model employs a deep neural network based on the Transformer architecture, which consists of multiple stacked encoders and decoders, each layer containing a multi-head self-attention mechanism and a feedforward neural network. The multi-head self-attention mechanism allows the model to simultaneously consider the dependencies between all other words in the input sequence while processing the current word. Through the calculation of the self-attention weight matrix, the model can automatically learn to semantically associate "it" with entities such as "credit card limit" and "billing date" appearing in previous iterations, thereby achieving context-aware semantic understanding. Furthermore, the first model can be used for injecting business knowledge into dynamic prompts, key element-driven context awareness, and ensuring the business compliance of the output results.
[0054] This application embodies a triple guarantee mechanism of on-demand triggering, knowledge enhancement, and security verification. On-demand triggering ensures that the first model is only invoked when completion is truly needed, avoiding unnecessary resource consumption; knowledge enhancement improves the accuracy and business relevance of the model completion results by introducing key elements and business domain knowledge; and security verification sets up a final line of defense to effectively prevent the misleading content of the large language model from misleading subsequent interactions, thereby improving the coherence of multi-turn dialogues while ensuring the reliability and business compliance of the system output.
[0055] Furthermore, referring to Figure 3 The third embodiment of the intent recognition method of this application provides a flowchart, based on the above. Figure 3 The embodiment shown further refines the step of "triggering the hierarchical follow-up questioning strategy, generating follow-up questioning information and outputting it to the user" in step S50, including steps A301 to A304: Step A301: Input the initial intent recognition result into the fuzzy intent detector to determine the fuzziness type of the initial intent recognition result; It should be noted that the fuzzy intent detector is a module specifically designed to analyze the clarity of intent recognition results. This module can identify and classify the fuzziness of intent recognition results based on preset fuzziness judgment rules or trained classification models. Fuzziness type is a specific classification of the fuzziness of user intent, and in this embodiment, it includes at least two basic types: intent fuzziness and slot fuzziness.
[0056] Step A302: If the ambiguity type is intent ambiguity, then the intent clarification sub-process is triggered to generate follow-up questions for intent confirmation or intent guidance. It should be noted that the intent clarification sub-process refers to a follow-up questioning process specifically designed to handle ambiguous intent levels, with the goal of helping users clarify their true intent. This sub-process first obtains a list of candidate intents generated during the intent recognition process and their confidence scores; then, it selects an appropriate follow-up questioning strategy based on the number and distribution of candidate intents. If there are a few candidate intents with relatively high confidence, it generates follow-up questioning information for the user to confirm.
[0057] Step A303: If the fuzziness type is slot fuzziness, the initial intent recognition result is input to the slot analyzer, and the slot analyzer outputs a list of missing slots or a list of slots with low confidence. It should be noted that slot ambiguity refers to a situation where the intent is clear, but the slot information required to complete the intent is missing or lacks sufficient confidence. The slot analyzer is a module specifically designed to analyze the slot filling status corresponding to intents. This module maintains the slot definitions required for each intent type, the necessity indicators of each slot, and the slot filling confidence assessment mechanism.
[0058] Step A304: Input the list of missing slots or the list of slots with low confidence into the dialogue generator. The dialogue generator calls the generation model based on the large language model to dynamically generate natural language follow-up dialogues for the slots.
[0059] It's important to note that the script generator is a module specifically designed to generate follow-up questions. This module integrates text generation capabilities based on a large language model, enabling it to dynamically generate natural language questions that conform to business specifications and context based on the input slot information. The large language model-based generation model is a fine-tuned or prompt-engineered large language model capable of generating specified types of text based on input conditions. The natural language follow-up question script is the final natural language text presented to the user for inquiring about missing information or confirming slot values. Specifically, the script generator first determines the specific content of the follow-up question to be generated for each slot based on the slot type in the list and the current dialogue context. Then, it constructs a prompt word to guide the generation task. This prompt word includes the slot name, slot description, business scenario information, dialogue history summary, and generation instructions, such as "Please generate a polite question to ask the user about the bill month based on the following information." The script generator inputs the prompt word into the large language model-based generation model, which generates the required natural language follow-up question script. The system receives the generated script text and outputs it to the user as follow-up information. The dialogue generator can generate diverse follow-up questions that are closer to real people's expression habits based on specific slots and dialogue context, thereby improving the user experience.
[0060] This application uses a fuzzy intent detector to determine whether the intent or slot is fuzzy. For fuzzy intent, it triggers an intent clarification sub-process to generate confirmation or guiding follow-up questions. For fuzzy slot, it uses a slot analyzer to identify a list of missing or low-confidence slots, and a dialogue generator calls a large language model to dynamically generate targeted natural language follow-up dialogues. Specifically, the hierarchical design of the follow-up strategy is achieved through the refined differentiation of fuzziness types. The intent clarification sub-process ensures effective guidance or confirmation when the user's intent is unclear, avoiding invalid interactions. The slot analyzer accurately locates information gaps and slots with insufficient confidence, providing clear targets for follow-up questions. The dialogue generator uses a large language model to dynamically generate diverse and context-appropriate natural language follow-up questions, completely solving the problems of rigidity and homogenization of traditional fixed dialogue templates. The overall solution achieves a complete closed loop from "fuzzy recognition" to "precise follow-up questions," significantly improving the efficiency of information collection and the naturalness of the user experience in multi-turn dialogues.
[0061] In one possible implementation, the step of dynamically generating natural language follow-up questions for the slot includes: Obtain a pre-generated slot clarification template, which defines the structured relationship between word slots, interface slots, and slot groups; The slot clarification template and the list of slots to be clarified are input into the generative model based on the large language model. The natural language follow-up questions output by the generation model that match the slot to be clarified are received as follow-up information.
[0062] It should be noted that in this embodiment, the slot clarification template is a pre-configured template file used to describe the structured relationships between intent-related slots. This template defines the attributes of different slot types and their interrelationships in a standardized format. Word slots are key information fields that need to be directly extracted from user input; interface slots are information fields that need to be obtained by calling external system interfaces; a slot group is a collection of multiple related slots that logically belong to the same information category or should be processed together during follow-up questions. The slot list to be clarified refers to the list of missing slots or slots with low confidence, clarifying the specific slots that need to be followed up with the user. The follow-up information is the output content ultimately presented to the user through the interactive interface or speech synthesis module.
[0063] This application uses a pre-generated slot clarification template as a structured basis, inputting it along with a list of slots to be clarified into a generative model based on a large language model. It then receives the natural language follow-up questions output by the model that match the slots to be clarified, serving as the follow-up information. Specifically, the slot clarification template predefines the structured relationships between word slots, interface slots, and slot groups, providing a clear semantic framework and business constraints for the generation of follow-up questions. Leveraging the powerful generative capabilities of the large language model, the structured slot information is transformed into natural, fluent, and context-appropriate follow-up questions, achieving a leap from "template filling" to "intelligent generation." The generated questions can present diverse expressions based on specific slot types and dialogue contexts, ensuring the accuracy of the follow-up content while significantly improving the naturalness of human-computer interaction and user experience.
[0064] Furthermore, referring to Figure 4 The fourth embodiment of the intent recognition method of this application provides a flowchart, based on the above. Figure 4 The embodiment shown iteratively optimizes the intent recognition fusion model, specifically including steps A401-A405: Step A401: Collect low-confidence data from online interaction logs as a dataset to be reflected upon; It should be noted that the online interaction log is all interaction records kept by the intent recognition system when processing user requests in the production environment. This includes user input, the system's output intent recognition results, confidence scores, processing timestamps, and subsequent user feedback. Low-confidence data refers to data samples where the highest confidence score output by the system during intent recognition is lower than a preset confidence threshold. The dataset to be reflected upon is the dataset to be processed after filtering out the low-confidence data. The low-confidence threshold can be dynamically adjusted according to the business scenario.
[0065] Step A402: Input each piece of data in the dataset to be reflected into multiple different large language model experts; It should be noted that large language model experts refer to multiple instances of large language models obtained through different architectural designs, different training data, or different fine-tuning methods. These models have their own characteristics and advantages in semantic understanding and intent recognition tasks. The aim is to reduce the bias and errors that may exist in a single model and improve the accuracy of comprehensive judgment through model diversity and differences.
[0066] Step A403: Receive the intent recognition results output by each of the large language model experts, and vote on the multiple intent recognition results. Take the intent with the most votes as the labeled answer for the corresponding data to generate a high-confidence reflection dataset. It's important to note that voting involves statistically analyzing the intent labels output by multiple expert models, using methods such as majority voting, weighted voting, or confidence-weighted voting to determine the final result. The labeled answer refers to the conclusion determined after multi-model voting, serving as the correct intent label for that data point. This conclusion is assumed to be the standard answer with high confidence. For example, if majority voting is used, the intent label that appears most frequently is selected as the labeled answer; if weighted voting is used, different weights are assigned based on the historical accuracy or confidence output of each expert model, and a weighted score is calculated to determine the labeled answer.
[0067] Step A404: The reflection dataset is automatically fed into the model training set, triggering the model to automatically train periodically. It should be noted that the model training set refers to the labeled data set used to train or fine-tune the intent recognition model, typically including historically accumulated standard corpora and newly added labeled data. Automatic inflow refers to the reflection dataset being automatically merged with the existing training set after generation, becoming part of the new training data, without manual intervention. Triggered periodic automatic model training refers to the system automatically initiating the model training process according to a preset time period (e.g., daily, weekly) or a data volume threshold (e.g., 1000 new data entries), using the updated training set to retrain or incrementally fine-tune each sub-model in the intent recognition fusion model.
[0068] Step A405: Automatically deploy the trained and updated model parameters to the production environment, replacing the corresponding model in the original intent recognition fusion model.
[0069] It should be noted that model parameters refer to the model weight file, configuration file, and related metadata obtained after training. For example, after the training task is completed, the system performs automated testing on the new model version to verify that its performance metrics on the test set are no lower than the old version. After the test is passed, the system gradually deploys the new model online through strategies such as canary release or blue-green deployment, replacing the corresponding sub-model in the original intent recognition fusion model. During the replacement process, the system ensures that the service is not interrupted and the user experience is not affected. After the deployment is completed, the system records the deployment time and version number of the new version and continuously monitors the online effect to provide a data foundation for the next iteration.
[0070] This application constructs a reflection dataset by collecting low-confidence online data, and generates a high-confidence reflection dataset by using voting annotations from multiple experts in different large language models. The reflection data is automatically fed into the training set to trigger periodic model training, and the updated model parameters are automatically deployed to the production environment to replace the original model. Specifically, the targeted collection of low-confidence data enables precise focusing on the model's weaknesses, avoiding the ineffective processing of massive amounts of high-confidence data; the multi-model expert voting mechanism effectively reduces the bias and errors of a single model, automatically generating high-quality training corpus and completely eliminating reliance on manual annotation; through a closed-loop process of automatic reflection data inflow, automatic model training, and automatic parameter deployment, seamless integration from online data to model updates is achieved, driving the continuous self-evolution and performance optimization of the intent recognition system, significantly improving the efficiency and accuracy of model iteration.
[0071] In one possible implementation, the step of using a dynamically configured intent recognition fusion model through a visual orchestration interface to make collaborative decisions on the completed or unrewritten user input to obtain an initial intent recognition result includes: The reflection dataset is input into the rule mining tool, and the large language model is invoked based on the rule mining tool to automatically extract high-frequency similar questions from the reflection dataset. The high-frequency similarity questions are converted into expert rules and stored in the expert rule base; The rules in the expert rule base are called first for fast matching. When a match is found, the intent recognition result is directly output in the early stage of intent recognition.
[0072] It's important to note that the rule miner is a module specifically designed to automatically extract pattern-based rules from data. This module uses a large language model to perform clustering analysis and pattern recognition on user input in the reflection dataset. High-frequency similar questions refer to sets of user questions that frequently appear in the reflection dataset and have similar semantic expressions or syntactic structures. These questions often correspond to high-frequency expressions of a specific intent. For example, the rule miner categorizes user input text in the reflection dataset according to intent labels, performs semantic similarity clustering on all user input under the same intent, and identifies recurring or highly similar expression patterns. For each identified high-frequency similar question category, the rule miner uses a large language model to summarize patterns and extract rules. The rule miner can also use template extraction algorithms to directly extract variable and fixed parts from similar questions, generating rule templates with wildcards. Expert rules are predefined rules stored in a structured format for quickly matching user input, typically expressed using regular expressions, keyword combinations, syntactic templates, etc. The expert rule base refers to a database or knowledge base that centrally stores and manages these expert rules, supporting efficient retrieval and dynamic updates. Prioritized rule matching refers to the system matching user input with rules in the expert rule base before entering the complex multi-model collaborative decision-making process. If a match is found, the system directly outputs the corresponding intent recognition result in the initial stage of intent recognition, without needing to call subsequent deep models. This pre-matching rule mechanism fully utilizes the deterministic patterns in high-frequency scenarios, enabling rapid response with extremely low computational cost. When a match is found, the system directly outputs the intent label corresponding to the rule as the recognition result and records this recognition as a rule match.
[0073] Optionally, the system can also set a confidence coefficient for rule matching. When a rule match is successful but a high confidence level is required, the rule matching result is compared with the model output result for verification, thereby further improving the recognition accuracy.
[0074] This application automatically extracts high-frequency similar questions from a reflection dataset using a rule miner, converts them into expert rules, and stores them in a rule base. Before the collaborative decision-making process of the intent recognition fusion model, these expert rules are prioritized for rapid matching, and intent recognition results are directly output upon successful matching. Specifically, by automatically mining high-frequency similar questions from the reflection dataset and converting them into expert rules, dynamic rule generation and continuous updates are achieved, allowing the rule base to keep pace with changes in real user expression habits. The introduction of a pre-rule matching mechanism before collaborative model decision-making enables high-frequency, deterministic user questions to be responded to quickly with minimal computational cost in the initial stage, significantly reducing system latency and model call overhead. Rule matching and model decision-making complement each other, with rules ensuring efficiency in high-frequency scenarios and the model guaranteeing generalization ability in complex scenarios, jointly constructing an efficient, accurate, and evolvable two-layer intent recognition architecture.
[0075] Furthermore, referring to Figure 5 The fifth embodiment of the intent recognition method of this application provides a flowchart, based on the above. Figure 5 The embodiment shown further refines the step of "dynamically configuring the intent recognition fusion model through a visual orchestration interface" in step S30, including steps A501 to A504: Step A501: Display a list of multiple available intent recognition models through a visual orchestration interface; It should be noted that the visual orchestration interface refers to the graphical user interface provided to business personnel or system maintenance personnel. This interface presents the various components of the intent recognition system and their interrelationships in an intuitive visual way. The available intent recognition models refer to various intent recognition model instances that have been integrated and registered in the system and can be invoked in the current business scenario. These include different types such as expert rule-based models, deep learning-based intent classification mini-models, and large language model-based intent recognition models. The list displays the names, types, version numbers, status information, and brief descriptions of these models in a list format within the visual orchestration interface, making it easy for users to understand the currently available model resources.
[0076] Optionally, when the system launches the visual orchestration interface, it first retrieves metadata for all deployed and functioning intent recognition models from the model registry center. The interface presents the retrieved model information in a list format, supporting filtering and sorting by model type, functional tags, or business domain. Users can intuitively see the available model resources through this list, providing a foundation for orchestration configuration. Furthermore, the model list display not only includes basic information but can also include model performance metrics such as average response time, accuracy, and recall, helping users make configuration decisions.
[0077] Step A502: Receive drag and connect operations from the user on the visual orchestration interface for the multiple intent recognition models, and configure the execution order of the multiple intent recognition models; It's important to note that drag-and-drop and connection operations refer to users using the mouse or touch to drag model icons from the model list to the canvas area in the visual orchestration interface, and defining the calling relationships and data flow between models by drawing connecting lines. Execution order refers to the sequence in which multiple models are invoked during intent recognition. Configuring the execution order through drag-and-drop connections makes complex process logic that previously required coding intuitive and visible, allowing business personnel to complete process orchestration without needing to understand the underlying technical details.
[0078] Optionally, the interface also supports adding conditional branches to the connections, such as deciding whether to call subsequent models based on the output confidence of the preceding model, further enriching the configurability of the process.
[0079] Step A503: Receive the recall threshold, weight parameters, and execution priority set by the user for each intent recognition model on the visual orchestration interface; It should be noted that the recall threshold refers to the minimum confidence score required for a model's output to be adopted. The weight parameter refers to the weight coefficient of each model's output in the final voting or weighted calculation during multi-model fusion decision-making; the higher the weight, the greater the model's influence on the final result. Execution priority determines the priority level of each model in acquiring computing resources when multiple models can be called in parallel or when resource contention exists; models with higher priority will be executed first.
[0080] Optionally, when a user selects a model node in the canvas, a parameter configuration panel for that model pops up on the right side of the interface. The panel contains configurable parameter fields where the user can enter or select the desired threshold, weight, and priority values. The system receives the user's input in real time and verifies the validity of the values, such as ensuring the recall threshold is between 0 and 1 and the weight parameters are positive. After configuration, the system associates and stores these parameters with the model node. The configurable parameters may differ depending on the model type; for example, rule-based models may focus more on threshold configuration, while deep learning models may focus more on weight configuration.
[0081] Step A504: Encapsulate the execution order, recall threshold, weight parameters, and execution priority into a configuration file as the runtime configuration of the intent recognition fusion model; wherein the intent recognition model includes at least two of the following: an expert rule-based model, a deep learning-based intent classification mini-model, and an intent recognition model based on a large language model.
[0082] It should be noted that a configuration file refers to a file stored in a standard format (such as JSON, YAML, or XML) containing complete orchestration information, which can be parsed by the intent recognition system and applied at runtime. Runtime configuration refers to the model fusion strategy configuration that actually takes effect when the system processes user requests.
[0083] Optionally, after the user completes the addition, connection configuration, and parameter settings of all model nodes, they can click the "Save" or "Publish" button on the interface. The system collects the execution order relationship of all model nodes in the canvas, the parameter settings of each node, and the connection information between nodes, and serializes this information into a configuration file according to a predefined format. The system stores the configuration file in the configuration center and marks it as the effective version for the current business scenario. When a user requests to enter the system, the intent recognition fusion model loads the configuration file and calls the corresponding model for collaborative decision-making according to the execution order, threshold, weight, and priority defined in it.
[0084] In one possible implementation, the system also supports version management and historical rollback of configuration files, allowing users to view historical configuration versions at any time and roll back when necessary. For example, in one specific implementation, the system encapsulates user-configured execution order (rule engine → small model → large model), recall threshold of 0.8 for the rule engine, recall threshold of 0.7 and weight of 0.6 for the small model, and recall threshold of 0.8 and weight of 0.4 for the large model into a JSON-formatted configuration file and saves it as "Credit Card Intent Recognition Strategy v3.2". This configuration takes effect immediately, and all subsequent intent recognition requests for credit card services will be subject to model fusion decisions based on this configuration.
[0085] This application displays a list of available models through a visual orchestration interface, accepts drag-and-drop connections from users to configure the execution order, and receives recall thresholds, weight parameters, and execution priorities set by users for each model. These configurations are then encapsulated into a runtime configuration file, enabling dynamic visual orchestration of intent recognition fusion models. Specifically, the visual orchestration interface transforms abstract model fusion strategies into intuitive graphical operations, allowing business personnel to flexibly configure the execution order, invocation conditions, and weight parameters of multiple models according to business needs without programming. This significantly reduces the technical threshold and operational costs of adjusting model fusion strategies. By encapsulating the configuration into a code-independent runtime configuration file, real-time effectiveness and hot reloading of strategies are achieved, supporting rapid adaptation and differentiated configuration across multiple business scenarios. Based on a diversified combination of expert rule models, deep learning small models, and large language models, the complementary advantages of different model types in speed, accuracy, and generalization ability are fully utilized to form a dynamically adjustable optimal collaborative decision-making mechanism.
[0086] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the intent identification method of this application. Any simple modifications based on this technical concept are within the protection scope of this application.
[0087] This application provides an intent recognition device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the intent recognition method in Embodiment 1 above.
[0088] The following is for reference. Figure 6 The diagram illustrates a structural schematic suitable for implementing the intent recognition device in the embodiments of this application. The intent recognition device in the embodiments of this application may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 6 The intent recognition device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.
[0089] like Figure 6 As shown, the intent recognition device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory 1002 or a program loaded from a storage device 1003 into a random access memory 1004. The random access memory 1004 also stores various programs and data required for the operation of the intent recognition device. The processing unit 1001, the read-only memory 1002, and the random access memory 1004 are interconnected via a bus 1005. An input / output interface 1006 is also connected to the bus. Typically, the following systems can be connected to the input / output interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. Communication device 1009 allows the intent identification device to communicate wirelessly or wiredly with other devices to exchange data. While the figures show intent identification devices with various systems, it should be understood that implementation or possession of all the systems shown is not required. More or fewer systems may be implemented alternatively.
[0090] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from read-only memory 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.
[0091] The intent recognition device provided in this application, employing the intent recognition method in the above embodiments, can solve the technical problem of how to achieve intent recognition for complex dialogue interactions in multi-service scenarios. Compared with the prior art, the beneficial effects of the intent recognition device provided in this application are the same as those of the intent recognition method provided in the above embodiments, and other technical features in this intent recognition device are the same as those disclosed in the previous embodiment method, and will not be repeated here.
[0092] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.
[0093] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0094] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the intent recognition method in the above embodiments.
[0095] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.
[0096] The aforementioned computer-readable storage medium may be included in the intent recognition device; or it may exist independently and not assembled into the intent recognition device.
[0097] The aforementioned computer-readable storage medium carries one or more programs. When these programs are executed by the intent recognition device, the intent recognition device causes the following: It receives user input in the current round; based on the user input and context information retrieved from the dialogue history, it determines whether semantic completion of the user input is needed; when semantic completion is needed, it rewrites the user input using a first model to generate completed user input; through a dynamically configured intent recognition fusion model via a visual orchestration interface, it makes collaborative decisions on either the completed user input or the unrewritten user input to obtain an initial intent recognition result; when the initial intent recognition result indicates that the user's intent is ambiguous, it triggers a hierarchical follow-up questioning strategy, generates follow-up questioning information, and outputs it to the user; it receives user feedback input in the next round and updates the dialogue history based on the feedback input to complete multi-round interaction.
[0098] Computer program code for performing the operations of this application can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, and conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0099] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0100] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.
[0101] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described intent recognition method, and can solve the technical problem of how to realize intent recognition for complex dialogue interactions in multi-service scenarios. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as the beneficial effects of the intent recognition method provided in the above embodiments, and will not be repeated here.
[0102] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.
Claims
1. An intent recognition method, characterized in that, The intent recognition method includes: Receive user input for the current round; Based on the user input and the context information retrieved from the dialogue history, determine whether semantic completion of the user input is necessary; When semantic completion of the user input is required, the user input is rewritten using the first model to generate the completed user input; The intent recognition fusion model, which is dynamically configured through a visual orchestration interface, performs collaborative decision-making on the completed user input or the user input before rewriting, and obtains the initial intent recognition result. When the initial intent recognition result indicates that the user's intent is ambiguous, a hierarchical follow-up questioning strategy is triggered to generate follow-up questioning information and output it to the user. Receive user feedback input in the next round and update the dialogue history based on the feedback input to complete multiple rounds of interaction.
2. The intent recognition method as described in claim 1, characterized in that, The step of determining whether semantic completion of the user input is needed includes: The user input is subjected to semantic integrity detection to obtain an integrity score; Semantic coherence detection is performed on the user input and the previous round of user input in the dialogue history to obtain a coherence score; The integrity score, the coherence score, and the preset custom rules are taken as inputs and fed into the low-loss judge. The low-loss judge outputs a Boolean judgment result indicating whether semantic completion is required.
3. The intent recognition method as described in claim 2, characterized in that, The step of rewriting the user input using the first model to generate the completed user input includes: When the Boolean judgment result is "yes", the first model is triggered; Key elements are extracted based on the dialogue history, and business domain knowledge is obtained based on the business knowledge base. The user input, the dialogue history, the key elements, and the business domain knowledge are concatenated into dynamic prompt words, which are then input into the large language model that serves as the first model. Receive the completed user input output by the large language model; The completed user input is input into the validator to verify its rationality and authenticity. If the validation passes, the completed user input is output; if the validation fails, the completed user input is discarded, and the original user input is output.
4. The intent recognition method as described in claim 3, characterized in that, The steps of triggering the hierarchical follow-up questioning strategy, generating follow-up questioning information, and outputting it to the user include: The initial intent recognition result is input into the fuzzy intent detector to determine the fuzziness type of the initial intent recognition result; If the ambiguity type is intent ambiguity, then the intent clarification sub-process is triggered to generate follow-up questions to confirm or guide intent. If the fuzziness type is slot fuzziness, the initial intent recognition result is input to the slot analyzer, and the slot analyzer outputs a list of missing slots or a list of slots with low confidence. The missing slot list or the slot list with low confidence is input into the dialogue generator. The dialogue generator calls the generative model based on the large language model to dynamically generate natural language follow-up dialogue for the slot.
5. The intent recognition method as described in claim 4, characterized in that, The step of dynamically generating natural language follow-up questions for the slot includes: Obtain a pre-generated slot clarification template, which defines the structured relationship between word slots, interface slots, and slot groups; The slot clarification template and the list of slots to be clarified are input into the generative model based on the large language model. The natural language follow-up questions output by the generation model that match the slot to be clarified are received as follow-up information.
6. The intent recognition method as described in claim 1, characterized in that, The method further includes iterative optimization of the intent recognition fusion model, specifically including: Collect low-confidence data from online interaction logs as a dataset for reflection; Each piece of data in the dataset to be reflected upon is input into multiple different large language model experts; Receive the intent recognition results output by each of the large language model experts, vote on multiple intent recognition results, and take the intent with the most votes as the labeled answer for the corresponding data to generate a high-confidence reflection dataset; The reflection dataset is automatically fed into the model training set, triggering the model to train automatically on a regular basis. The updated model parameters are automatically deployed to the production environment, replacing the corresponding model in the original intent recognition fusion model.
7. The intent recognition method as described in claim 6, characterized in that, The step prior to the dynamic configuration of the intent recognition fusion model through a visual orchestration interface to make collaborative decisions on the completed or unrewritten user input to obtain the initial intent recognition result includes: The reflection dataset is input into the rule mining tool, and the large language model is invoked based on the rule mining tool to automatically extract high-frequency similar questions from the reflection dataset. The high-frequency similarity questions are converted into expert rules and stored in the expert rule base; The rules in the expert rule base are called first for fast matching. When a match is found, the intent recognition result is directly output in the early stage of intent recognition.
8. The intent recognition method as described in claim 1, characterized in that, The steps of dynamically configuring the intent recognition fusion model through a visual orchestration interface include: A list of multiple available intent recognition models is displayed through a visual orchestration interface; Receive user drag-and-drop and connection operations on the multiple intent recognition models on the visual orchestration interface, and configure the execution order of the multiple intent recognition models; Receive the recall threshold, weight parameters, and execution priority set by the user for each intent recognition model on the visual orchestration interface; The execution order, recall threshold, weight parameters, and execution priority are encapsulated into a configuration file as the runtime configuration of the intent recognition fusion model; wherein the intent recognition model includes at least two of the following: an expert rule-based model, a deep learning-based intent classification mini-model, and an intent recognition model based on a large language model.
9. An intent recognition device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the intent recognition method as described in any one of claims 1 to 7.
10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the intent recognition method as described in any one of claims 1 to 7.