An intelligent interactive intent recognition method based on fusion of multi-strategy and entity bidirectional matching
By integrating multi-strategy and entity bidirectional matching into an intelligent interactive intent recognition method, the problems of insufficient context awareness, entity parameter completion, and ambiguous intent handling in multi-turn dialogues are solved, thereby improving the accuracy of intent recognition and interaction efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING YIXUN ZHENGTONG NETWORK COMM TECH CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies suffer from insufficient context awareness, lack of entity parameter completion capabilities, and rigid handling of ambiguous intents in multi-turn dialogue scenarios, resulting in low accuracy of intent recognition and low interaction efficiency.
An intelligent interactive intent recognition method that integrates multiple strategies and bidirectional entity matching is adopted. By combining a multi-strategy mechanism that integrates vector matching, template matching, deep learning model classification and rule matching, and combining a disambiguation strategy based on bidirectional entity matching scores and dynamic thresholds, the method achieves dialogue context awareness and intelligent parameter filling.
It improves the accuracy of intent recognition and interaction efficiency in multi-turn dialogues, effectively avoids intent misjudgment and invalid interaction, and enhances user experience.
Smart Images

Figure CN122242530A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of natural language processing and intelligent human-computer interaction, and in particular to an intelligent interactive intent recognition method based on the fusion of multiple strategies and bidirectional entity matching. Background Technology
[0002] With the development of artificial intelligence technology, human-computer interaction systems based on natural language have gradually become the main way for users to communicate with smart devices. Intent recognition, as a core component of intelligent interaction systems, aims to accurately identify the behavioral intent expressed by users through natural language, providing a basis for subsequent function execution.
[0003] Existing techniques for intent recognition mainly include rule-based matching, template matching, deep learning models, and vector similarity matching. However, a single recognition method often has significant drawbacks: rule-based or template-based methods struggle to cover the diversity of user expressions; pure deep learning models rely on large amounts of labeled data and do not adequately utilize context; and vector matching methods are easily affected by the coverage of training examples.
[0004] More importantly, existing technologies face three core problems in multi-turn dialogue scenarios:
[0005] (1) Insufficient context awareness makes it difficult to accurately determine the type of user intent transfer: Existing technologies have not established quantitative rules for determining intent transfer, and only judge the intent state through simple semantic matching. For example, if a user asks "What about Chengdu?" after querying "Beijing weather", some systems will misjudge it as a new intent rather than an update of the location parameters of the "weather query" intent, resulting in poor fluency in multi-turn dialogues;
[0006] (2) Lack of intelligent entity parameter completion capability. When the user input information is incomplete (such as “play music” without specifying the singer), the existing technology often directly asks the user without combining information such as user profile, geographical location, and conversation context for intelligent filling. Frequent invalid queries reduce the interaction efficiency.
[0007] (3) The handling of ambiguous intents is rigid, and fixed thresholds are often used to determine ambiguity. The system lacks intelligent completion and proactive clarification capabilities, which seriously affects the user experience.
[0008] Therefore, how to solve the problems of mismatch between intent and entity in multi-turn dialogues, ambiguity and vagueness of intent, and improve the accuracy of intent recognition are technical problems that need to be solved. Summary of the Invention
[0009] Based on this, the present invention provides an intelligent interactive intent recognition method based on the fusion of multiple strategies and bidirectional entity matching. The present invention addresses the problems of the prior art by integrating a multi-strategy initial screening mechanism, a scoring mechanism for bidirectional matching of intent and entity, a quantified intent transfer judgment rule, a dynamic threshold disambiguation strategy, and multi-dimensional parameter dynamic filling, thereby improving the accuracy of intent recognition and the interaction efficiency in multi-turn dialogues.
[0010] One embodiment of the present invention provides an intelligent interactive intent recognition method based on the fusion of multi-strategy and bidirectional entity matching, the method comprising:
[0011] Receive user's voice or text input, and perform speech-to-text processing on the voice input to obtain a unified text input;
[0012] The text input is standardized and preprocessed, and combined with the conversation context, a multi-strategy mechanism of vector matching, template matching, deep learning model classification and rule matching is integrated, and candidate intents and their initial confidence are generated based on the weight coefficients of each strategy.
[0013] Perform generic named entity recognition on the standardized preprocessed text to extract entity information;
[0014] The candidate intents are reordered using the entity bidirectional matching score, which includes a positive matching score based on entity coverage and matching degree and a negative matching score based on entity specificity and relevance.
[0015] Determine the current active intent by combining context and analyze intent shift types and intent disambiguation processing;
[0016] The final identified intent is output and the corresponding task is triggered.
[0017] In some embodiments of the present invention, the standardization preprocessing of text input, combined with the session context, integrates a multi-strategy mechanism of vector matching, template matching, deep learning model classification, and rule matching, and generates candidate intents and their initial confidence scores based on the weight coefficients corresponding to each strategy, specifically includes:
[0018] The text input is cleaned and preprocessed by word segmentation, and the processing results of the two modalities are unified into standardized interactive text.
[0019] The current session's dialogue context is obtained based on the session ID. The dialogue context includes the current intent ID, collected parameters, context entities, interaction history, and modal input records.
[0020] A multi-strategy mechanism is used to perform preliminary intent recognition on the standardized interactive text, generating multiple candidate intents and their corresponding initial confidence scores. The multi-strategy mechanism specifically includes four strategies: vector matching, template matching, deep learning model classification, and rule matching. Each strategy is configured with dynamically adjustable weight coefficients. For each candidate intent, the initial confidence score obtained under each strategy is multiplied by the corresponding weight coefficient, and the confidence scores of the same intent are weighted and averaged to obtain the initial confidence score S_initial after fusion of the candidate intents. The calculation formula is as follows:
[0021] S_initial=∑(conf_initial_i×wc_i);
[0022] In the formula, conf_initial_i is the i-th initial confidence level, wc_i is the weight coefficient corresponding to the strategy, and ∑wc_i=1.
[0023] In some embodiments of the present invention, the step of performing general named entity recognition on the standardized preprocessed text to extract entity information specifically includes:
[0024] Perform generic named entity recognition on the standardized preprocessed text to extract one or more entities. Each entity includes entity type, entity value, confidence level, and source.
[0025] In some embodiments of the present invention, the positive matching score based on entity coverage and matching degree specifically includes:
[0026] The extracted entity information is forward matched with the preset entity parameters of each candidate intent, and the forward matching score S_forward is calculated using the following formula:
[0027] S_forward=(N_matched / N_required)×(Σ(conf_entity_i×w_i) / N_matched);
[0028] In the formula, N_matched is the number of required entities for a successful match, N_required is the total number of required entities required by the intent, conf_entity_i is the confidence level of the i-th matched entity, w_i is the preset importance weight coefficient of the entity in the intent, and Σw_i=1.
[0029] In some embodiments of the present invention, the reverse matching score based on entity specificity and relevance specifically includes:
[0030] Based on the extracted entity information, a matching intent set is found in reverse, and the reverse matching score S_reverse is calculated. The calculation formula is as follows:
[0031] S_reverse=(Σexclusivity(e_j,intent_k)×relevance(e_j,intent_k)) / M;
[0032] In the formula, M is the total number of entities extracted, exclusivity(e_j, intent_k) is the exclusivity of entity e_j to intent_k, and relevance(e_j, intent_k) is the relevance between entity e_j and intent_k.
[0033] In some embodiments of the present invention, the reordering of the candidate intents specifically includes:
[0034] The comprehensive confidence S_comprehensive is calculated using the initial confidence S_initial, forward matching score S_forward, and reverse matching score S_reverse after candidate intent fusion. The comprehensive confidence S_comprehensive = α × S_initial + β × S_forward + γ × S_reverse is calculated, where α is a preset initial confidence weight coefficient, β is a preset forward verification score weight coefficient, and γ is a preset reverse verification score weight coefficient, and α + β + γ = 1. The candidate intents are then reordered in descending order based on the comprehensive confidence S_comprehensive.
[0035] In some embodiments of the present invention, the step of determining the current active intent based on the context and analyzing the intent transfer type and intent disambiguation processing specifically includes:
[0036] Based on the context information, it is determined whether there is an active intent in the current session, and the user intent transfer type is analyzed based on the new input content. The intent transfer type includes continue, repair, refine, switch, or cancel. When the difference between the highest overall confidence score and the second highest overall confidence score after reordering is less than a preset ambiguity threshold, a clarification question is generated, and at least two high-confidence candidate intents are displayed to the user for confirmation. The preset ambiguity threshold is dynamically adjusted according to the dialogue round and the historical ambiguity rate.
[0037] In some embodiments of the present invention, the output that ultimately identifies the intent and triggers the execution of the corresponding task specifically includes:
[0038] Before determining the final identification intent, check whether the necessary entity parameters are complete, and fill in the missing parameters statically or dynamically according to the preset default value type; output the final identification intent and the corresponding entity parameters, and trigger the corresponding dialogue management or task execution process.
[0039] Compared with the prior art, the present invention has the following beneficial effects:
[0040] (1) By establishing clear context-aware logic and quantitative intent transfer judgment rules, the intent state of users in multi-turn dialogues (continue, repair, refine, switch or cancel) can be accurately identified. The user's parameter update, supplement and other operations on the existing intent can be accurately identified, effectively avoiding the problem of intent misjudgment and greatly improving the fluency of multi-turn dialogues.
[0041] (2) The intention disambiguation strategy of dynamic threshold is adopted to actively clarify to the user when the intention is ambiguous, and dynamic parameters are filled in by combining context, time, geographical location and user profile, which reduces invalid interaction and improves user experience and task completion efficiency.
[0042] (3) The disambiguation threshold is dynamically adjusted based on the dialogue rounds and historical ambiguity rate to replace the fixed threshold judgment mode. At the same time, when the intent is ambiguous, the clarification question is actively generated and high-confidence candidate intents are displayed for user confirmation. Combined with intelligent parameter completion capabilities, the flexible and intelligent processing of ambiguous intents is realized, which significantly improves the user interaction experience. At the same time, the present invention integrates a multi-strategy and entity bidirectional matching recognition method to realize mutual verification between entities and intents, effectively solving the matching error problem caused by single intent recognition or entity extraction, improving the overall accuracy of intent recognition from the bottom layer, and providing accurate basic recognition results support for disambiguation and parameter filling. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 The diagram shows a flowchart of an intelligent interactive intent recognition method based on the fusion of multiple strategies and bidirectional entity matching proposed in an embodiment of the present invention.
[0045] Figure 2 This diagram illustrates the integrated application of the multi-turn dialogue scenario (intent transfer and dynamic filling) proposed in the embodiments of the present invention. Detailed Implementation
[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0047] This invention provides an intelligent interactive intent recognition method based on the fusion of multi-strategy and bidirectional entity matching, such as... Figure 1 As shown, it includes the following steps:
[0048] S101, Receive user's voice or text input, and perform voice-to-text processing on the voice input to obtain unified text input.
[0049] Specifically, in a preferred embodiment of the present invention,
[0050] The system receives interaction requests from users via voice (such as voice assistants) or text (such as intelligent customer service or chatbots). The system receives input information and records the input modality. For example, after receiving a user's voice input request "What will the weather be like in Chengdu tomorrow?", the system converts the voice into the text "What will the weather be like in Chengdu tomorrow?".
[0051] S102 performs standardized preprocessing on the text input, and combines the conversation context with a multi-strategy mechanism that integrates vector matching, template matching, deep learning model classification, and rule matching. Based on the weight coefficients of each strategy, candidate intents and their initial confidence scores are generated.
[0052] Specifically, in a preferred embodiment of the present invention,
[0053] The text input is cleaned and preprocessed by word segmentation, and the processing results of the two modalities are unified into standardized interactive text.
[0054] The current session's dialogue context is obtained based on the session ID. The dialogue context includes the current intent ID, collected parameters, context entities, interaction history, and modal input records.
[0055] A multi-strategy mechanism is used to perform preliminary intent recognition on the standardized interactive text, generating multiple candidate intents and their corresponding initial confidence scores. The multi-strategy mechanism specifically includes four strategies: vector matching, template matching, deep learning model classification, and rule matching. Each strategy is configured with dynamically adjustable weight coefficients. For each candidate intent, the initial confidence score obtained under each strategy is multiplied by the corresponding weight coefficient, and the confidence scores of the same intent are weighted and averaged to obtain the initial confidence score S_initial after fusion of the candidate intents. The calculation formula is as follows:
[0056] S_initial=∑(conf_initial_i×wc_i);
[0057] In the formula, conf_initial_i is the i-th initial confidence level, wc_i is the weight coefficient corresponding to the strategy, and ∑wc_i=1. The weight coefficients of each strategy are dynamically adjusted according to a multi-dimensional feedback mechanism, mainly including: statistical feedback based on historical recognition accuracy (strategies with higher accuracy receive higher weights), preset mapping based on application scenarios and domain characteristics, and long-term learning based on users' personalized interaction habits (fine-tuning weights for specific users' expression preferences). This enables the fusion multi-strategy model to adapt to different business environments and always maintain superior initial recognition performance.
[0058] The four strategies—vector matching, template matching, deep learning model classification, and rule matching—independently calculate the initial confidence scores for their respective intentions, without interfering with each other. The specific methods for obtaining the initial confidence scores (conf_initial_i) for each of the four strategies include:
[0059] 1. Initial confidence score for vector matching (conf_initial_vector): A pre-built intent vector library is constructed, and the standard representations of each intent are converted into vectors and stored. The standardized interactive text is converted into vectors of the same dimension. The cosine similarity between the text vector and each intent vector in the intent vector library is calculated, and the cosine similarity is used as the initial confidence score for vector matching. The higher the similarity, the higher the confidence score. The value range is [0,1].
[0060] 2. Template Matching Initial Confidence (conf_initial_template): Based on the typical expression of each intent, a corresponding matching template is designed in advance (e.g., the template for a weather query intent is "[time] + [location] + weather + question word"). The standardized interactive text is compared with the matching template of each intent, and the matching degree between the text and the template is calculated (number of matched characters / total number of characters in the template). This matching degree is used as the initial confidence degree for template matching. The higher the matching degree, the higher the confidence degree, with a value range of [0,1]. If the text and the template match completely, the confidence degree can be set to 1.0; if they do not match completely, the confidence degree is set to 0.
[0061] 3. Initial Confidence of Deep Learning Model Classification (conf_initial_model): A trained deep learning classification model is used. In a preferred embodiment of this invention, a BERT-based version of the NER model is adopted. This model has a good balance between performance and efficiency in general domain entity recognition tasks. Standardized interactive text is used as the model input, and the model output layer outputs the probability value of each candidate intent through the softmax function. This probability value is the initial confidence of the deep learning model classification. The sum of the confidence of all candidate intents is 1, and the value range is [0,1]. The deep learning model needs to be pre-trained based on the labeled "text-intent" training corpus. After training, it is verified through the test set to ensure that the recognition accuracy meets the preset requirements.
[0062] 4. Initial Confidence for Rule Matching (conf_initial_rule): Pre-set the recognition rules for each intent (e.g., the weather query intent rule is "contains keywords such as 'weather', 'temperature', and 'sunny / rainy', and contains time and location entities"); perform keyword matching and entity matching on the standardized interactive text to determine whether it meets the recognition rules for the corresponding intent, and assign an initial confidence level based on the degree of matching satisfaction: the confidence level for fully satisfying the rules is set to 0.8-1.0, the confidence level for partially satisfying the rules (e.g., missing non-essential keywords) is set to 0.4-0.7, and the confidence level for completely dissatisfying the rules is set to 0-0.3. The specific values can be preset according to the business scenario and dynamically optimized based on historical interaction data using machine learning methods.
[0063] For example, in the above example of "What will the weather be like in Chengdu tomorrow?", the system performs preprocessing such as speech-to-text conversion, cleaning, word segmentation, and stop word removal to obtain standardized interactive text. The input modality is labeled as "speech," and the processing timestamp is 2026-01-01 10:00:00. The system loads the dialogue context based on the user's session ID. The current context is: no current intent ID, empty collected parameters, empty context entities, no historical interaction records, and empty modal input records. Through four strategies—vector matching, template matching, deep learning model classification, and rule matching—the initial confidence scores for the weather query intent (QUERY_WEATHER) are obtained as follows: vector matching initial confidence score conf_initial_vector = 0.91, template matching initial confidence score conf_initial_template = ... With an initial confidence score of 0.86 for deep learning model classification (conf_initial_model = 0.93) and an initial confidence score of conf_initial_rule for rule matching (conf_initial_rule = 0.82), and preset weight coefficients W_vector = 0.3, W_template = 0.1, W_model = 0.4, and W_rule = 0.2, the initial confidence score after fusion is S_initial = 0.91 × 0.3 + 0.86 × 0.1 + 0.93 × 0.4 + 0.82 × 0.2 = 0.899, and the candidate intent is "weather query".
[0064] Note: A parameter adjustment template is provided for general human-computer interaction scenarios (see Table 1). Other vertical scenarios can be adapted based on this template.
[0065] Table 1:
[0066]
[0067] The rules for adapting parameters for vertical scenes can be found in Table 2:
[0068] Table 2:
[0069]
[0070] S103 performs general named entity recognition on the standardized preprocessed text to extract entity information.
[0071] Specifically, in a preferred embodiment of the present invention,
[0072] After completing the text standardization preprocessing, generic named entity recognition (NER) is performed on the standardized interactive text to extract one or more entities. Each entity includes entity type, entity value, confidence level, and source. As a preferred implementation, NER uses a BERT-based version, which offers a good balance between performance and efficiency in general domain entity recognition tasks. During entity extraction, the system generates a confidence level for each identified entity, ranging from [0,1]. To ensure extraction quality, this embodiment sets an entity confidence threshold T=0.5, retaining only entities with a confidence level ≥T for subsequent processing, and discarding entities below this threshold. For critical scenarios, a retry mechanism can be configured: when a necessary entity is missing, the threshold can be lowered to 0.3 for re-extraction, or a different model can be used for secondary recognition. The threshold of 0.5 is set based on optimization results from experimental data. Experiments on 1000 test corpora show that when the threshold is set to 0.5, the entity recognition accuracy reaches 92%, the recall rate reaches 88%, and the confidence rate is 90%. A threshold that is too low (such as 0.3) will lead to an increase in noisy entities, and the accuracy will drop to 78%. A threshold that is too high (such as 0.7) will cause the recall rate to drop sharply to 65%. Therefore, 0.5 is the preferred threshold in this embodiment.
[0073] Furthermore, when re-executing the entity extraction operation, the success rate and confidence of entity extraction can be improved by adjusting the input parameters of the BERT-base version of the NER model (such as window size and number of iterations) or by performing secondary optimization on the standardized preprocessed text (such as supplementing contextual information), thus ensuring the stable and efficient execution of the entity extraction process.
[0074] For example, in the above example of "What will the weather be like in Chengdu tomorrow?", the following can be extracted:
[0075] Entity 1: Type = DATE, Value = "tomorrow", Confidence = 0.98, Source = Voice;
[0076] Entity 2: Type = LOCATION, Value = "Chengdu", Confidence = 0.96, Source = Voice.
[0077] S104, the candidate intents are reordered by the bidirectional matching score of entities, the bidirectional matching score including the positive matching score based on entity coverage and matching degree and the negative matching score based on entity specificity and relevance.
[0078] Specifically, in a preferred embodiment of the present invention,
[0079] The extracted entity information is forward matched with the preset entity parameters of each candidate intent, and a forward matching score combining entity coverage and matching degree is calculated. Based on the extracted entity information, a suitable intent set is searched in reverse, and a reverse matching score combining entity and intent specificity and relevance is calculated. The initial confidence, forward matching score and reverse matching score of each candidate intent are combined to calculate a comprehensive confidence, and the candidate intents are re-ranked accordingly. After re-ranking, the candidate intent with the highest comprehensive confidence is the candidate that best matches the user's true intent.
[0080] Specifically, the positive matching score based on entity coverage and matching degree involves performing a positive matching between the extracted entity information and the preset entity parameters of each candidate intent, and calculating the positive matching score S_forward. The calculation formula is as follows:
[0081] S_forward=(N_matched / N_required)×(Σ(conf_entity_i×w_i) / N_matched);
[0082] In the formula, N_matched is the number of required entities that are successfully matched, N_required is the total number of required entities required by the intent, conf_entity_i is the confidence level of the i-th matched entity, and w_i is the preset importance weight coefficient of the entity in the intent, and Σw_i=1. The importance weight coefficient w_i is used to distinguish the contribution of different entities in the intent recognition process. It can be preset by business experts or obtained through statistical learning methods based on training corpus. In the preset method based on business knowledge, during the intent template definition stage, business experts can assign weights to different required entities under the same intent according to business logic and domain knowledge. In the self-learning method based on corpus statistics, when there is a certain scale of user-annotated corpus, the weight of each entity can be automatically learned through statistical learning methods.
[0083] For example, in the above example of "What will the weather be like in Chengdu tomorrow?", the required entities for the weather query intent are DATE and LOCATION, both of which have been matched, N_matched / N_required=1, and the average entity match confidence score is (0.98+0.96) / 2=0.97. Let w_DATE=0.6 and w_LOCATION=0.4, then...
[0084] Σ(conf_entity_i×w_i)=0.98×0.6+0.96×0.4=0.972, forward matching score S_forward=1×0.972=0.972.
[0085] The reverse matching score S_reverse, based on entity specificity and relevance, assists the forward matching score, corrects the matching deviation between entities and intents, and improves the comprehensiveness and accuracy of intent matching. Together with the forward matching score, it serves as the basis for calculating the overall confidence score. Based on the entity information extracted in the preceding steps, all suitable intent sets are searched in reverse. The reverse matching score S_reverse for each candidate intent is calculated using a preset formula:
[0086] S_reverse=(Σexclusivity(e_j,intent_k)×relevance(e_j,intent_k)) / M;
[0087] In the formula, M represents the total number of entities extracted, exclusivity(e_j, intent_k) is the exclusiveness of entity e_j to intent_k, and relevance(e_j, intent_k) is the relevance between entity e_j and intent_k. The exclusiveness (e_j, intent_k) is calculated by comparing the number of times entity e_j appears in intent_k with the total number of times it appears in all intents in historical dialogue data. The closer this value is to 1, the more exclusive the entity is to that intent. The relevance (e_j, intent_k) is calculated by comparing the number of times entity e_j appears in intent_k with the total number of times it appears in all intents. The closer this value is to 1, the more exclusive the entity is to that intent. The probability values of relevance(e_j,intent_k) are calculated based on the point mutual information (PMI) values of the intent and entity co-occurrence matrix; relevance(e_j,intent_k)=PMI(e_j,intent_k)=log(P(e_j,intent_k) / P(e_j)P(intent_k)), where the probability values are obtained from statistical historical corpora; where P(e_j,intent_k) is the probability of co-occurrence of entity e_j and intent_k, P(e_j) is the marginal probability of entity e_j appearing, and P(intent_k) is the marginal probability of intent_k appearing.
[0088] For example, in the above example of "What will the weather be like in Chengdu tomorrow", both extracted entities match the weather query intent. According to historical corpus statistics, the entity "tomorrow" has an exclusivity of 0.85 and a relevance of 0.92 for the weather intent; the entity "Chengdu" has an exclusivity of 0.78 and a relevance of 0.95; the reverse matching score S_reverse=(0.85×0.92+0.78×0.95) / 2=(0.782+0.741) / 2=0.7615.
[0089] The process of reordering the candidate intents involves calculating a comprehensive confidence score S_comprehensive = α × S_initial + β × S_forward + γ × S_reverse based on the initial confidence score S_initial, the forward matching score S_forward, and the reverse matching score S_reverse after the candidate intents are fused. Here, α is a preset initial confidence score weighting coefficient, β is a preset forward verification score weighting coefficient, and γ is a preset reverse verification score weighting coefficient, with α + β + γ = 1. The candidate intents are then reordered in descending order based on the comprehensive confidence score S_comprehensive. After reordering, the candidate intent with the highest comprehensive confidence score is the one that best matches the user's true intent and can be prioritized as the basis for subsequent business execution. This effectively avoids matching bias caused by a single scoring dimension and improves the accuracy of intent recognition.
[0090] For example, in the above example "What will the weather be like in Chengdu tomorrow?", with the preset α=0.3, β=0.4, γ=0.3, the comprehensive confidence S_comprehensive=0.899×0.3+0.972×0.4+0.7615×0.3=0.2697+0.3888+0.2285=0.887, there are no other candidate intents, and the only candidate intent after sorting is the weather query intent (QUERY_WEATHER).
[0091] S105, combine the context to determine the current active intent and analyze the intent transfer type and intent disambiguation processing.
[0092] Specifically, in a preferred embodiment of the present invention,
[0093] Based on the contextual information, it is determined whether there is an active intent in the current session, and the user intent shift type is analyzed based on the new input content. The intent shift type includes: continue, repair, refine, switch, or cancel, which is specifically determined by calculating semantic similarity using BERT.
[0094] 1. Continue: If there is an active intent, and the semantic similarity between the new input text and the previous user input text is greater than the preset threshold T_continue, then it is determined as "continue", indicating that the user wants to continue interacting around the current active intent;
[0095] 2. Repair: If there is an active intent, and the semantic similarity between the new input text and the previous user input text is greater than the preset threshold T_repair, and the new input text modifies the entity parameter value that already exists in the current active intent, then it is determined to be an intent transfer type "repair", which indicates that the user wants to correct the error or inaccurate information in the current active intent;
[0096] 3. Refinement: If there is an active intent, and the semantic similarity between the new input text and the previous user input text is greater than the preset threshold T_refine, and the new input text supplements the missing necessary entities required by the current active intent, then it is determined to be the intent transfer type "refinement", which indicates that the user wants to supplement and improve the key information of the current active intent;
[0097] 4. Switching: If there is an active intent, and the semantic similarity between the new input text and the previous user input text is less than the preset threshold T_switch, and the entity information extracted from the new input text is more suitable for another candidate intent, then it is determined to be an intent transfer type "switching", which means that the user wants to abandon the current active intent and switch to a new intent;
[0098] 5. Cancel: If a clear negative word or cancellation word (such as "cancel", "don't want it", "never mind") is detected in the new input text, and no new candidate intent is triggered, there is no need to calculate semantic similarity. It is then determined to be an intent transfer type "cancel", which indicates that the user wants to terminate the current active intent and no longer continue the related interaction.
[0099] The thresholds mentioned above can be adjusted according to the actual application scenario. In a preferred embodiment, the continue threshold T_continue is set to 0.85, the repair threshold T_repair is set to 0.75, the refinement threshold T_refine is set to 0.7, and the switch threshold T_switch is set to 0.5. It should be noted that these thresholds are only exemplary values, and those skilled in the art can set them according to actual needs or automatically learn them through machine learning methods.
[0100] The intent disambiguation process addresses the problem of inaccurately determining a user's true intent when multiple candidate intents have similar confidence levels. Specifically, when the difference between the highest and second-highest overall confidence levels after reordering is less than a preset ambiguity threshold, a clarification question is generated, and at least two high-confidence candidate intents are displayed to the user for confirmation. The preset ambiguity threshold is dynamically adjusted based on the number of dialogue rounds and historical ambiguity rates. In a preferred embodiment, the dynamic ambiguity threshold is calculated using the following formula:
[0101] threshold_dynamic=threshold_base×(1+k1×turn_count) / (1+k2×ambiguity_rate_history);
[0102] In the formula, threshold_base is the base threshold, a preset initial baseline value that can be determined during initial debugging based on the business scenario; turn_count is the current dialogue round. The more rounds, the clearer or more complex the user's intent may be, requiring dynamic adjustment of the threshold sensitivity; ambiguity_rate_history is the historical ambiguity rate, which is the ratio of the number of times intent ambiguity occurred in historical dialogues to the total number of dialogues. It is used to optimize the threshold based on historical data and reduce repeated ambiguity judgments; k1 is the dialogue round adjustment coefficient, and k2 is the historical ambiguity rate adjustment coefficient. They are used to balance the influence of dialogue rounds and historical ambiguity rate on the dynamic threshold and can be preset based on actual debugging results to ensure that the dynamic threshold can adapt to the needs of different dialogue scenarios.
[0103] For example, in the example of "What will the weather be like in Chengdu tomorrow?" above, there is no current intent ID and no ongoing intent, so it is determined to be a new intent and no intent transfer analysis is needed; there is only one candidate intent, which is unambiguous and does not require confirmation from the user, so the preliminary identification result is a weather query intent (QUERY_WEATHER).
[0104] It should be noted that the above dynamic ambiguity threshold formula is only one preferred implementation. Those skilled in the art can use other functional forms to achieve dynamic adjustment of the threshold based on the same principle.
[0105] S106, output the final recognition intent and trigger the corresponding task execution.
[0106] Specifically, in a preferred embodiment of the present invention,
[0107] Before determining the final recognition intent, the dynamic parameter filling checks whether the necessary entity parameters are complete and intelligently fills in the missing parameters statically or dynamically according to the preset default value type. The dynamic default value is filled according to the current session context, time, geographical location, user profile, etc. Finally, the final recognition intent is output and the corresponding dialogue management or task execution process is triggered.
[0108] For example, in the above example of "What is the weather like in Chengdu tomorrow?", the necessary entity parameters for checking the weather query intent, DATE and LOCATION, have been extracted and do not need to be filled in. The intent recognition result (QUERY_WEATHER, confidence 0.887) and entity parameters (DATE=tomorrow, LOCATION=Chengdu) are output in a standardized manner, and the dialogue context is updated: current intent ID=QUERY_WEATHER, collected parameters={DATE:tomorrow, LOCATION:Chengdu}, context entity={DATE:tomorrow, LOCATION:Chengdu}, modal input record={2026-01-01 10:00:00: voice}, triggering the task execution flow of the weather query.
[0109] To more comprehensively demonstrate the technical effects of this invention, a comprehensive application example of a multi-turn dialogue scenario (intent transfer and dynamic filling) is provided below, fully demonstrating the integrated application of intent transfer determination, dynamic threshold disambiguation, and dynamic default value filling, such as... Figure 2 As shown, a four-round dialogue is demonstrated, with the following core parameter configurations: base threshold_base=0.15, adjustment coefficient k1=0.05, adjustment coefficient k2=2, initial confidence weight coefficient α=0.3, positive validation score weight coefficient β=0.4, negative validation score weight coefficient γ=0.3, and historical ambiguity rate_rate_history=0.15.
[0110] S201, Round 1, User inputs: "Book me a train ticket to Beijing for tomorrow."
[0111] System Processing: Multi-strategy fusion identifies the intent as booking train tickets (BOOK_TRAIN), with an initial confidence level of 0.94; Entity Extraction: DATE = "Tomorrow" (0.98), DESTINATION = "Beijing" (0.97); Bidirectional Matching: Calculated using formulas, the forward matching score is 0.96, the reverse matching score is 0.95, and the overall confidence level is 0.95; Checking Required Parameters: Missing Departure and Seat (CLASS); Dynamic Filling: The departure location is filled with "Chengdu" based on geographical location (the user's current location is Chengdu), and the seat is filled with "Second Class Seat" based on user profile (historical preference); Outputting the intent and triggering the ticket booking process, updating the context.
[0112] S202, Second Round, User Input: "Change to the afternoon of the day after tomorrow".
[0113] System processing: Obtain context: The current active intent is to book train tickets (BOOK_TRAIN), and the collected parameters are {DATE: "tomorrow", DESTINATION: "Beijing", DEPARTURE: "Chengdu", CLASS: "second class seat"}; Entity extraction: DATE = "the day after tomorrow" (0.97), TIME = "afternoon" (0.85); Intent transfer determination: The correction word "change to" was detected, and BERT calculated the semantic similarity = 0.80 > 0.75, which was determined to be a repair type; Reuse the departure point "Chengdu", destination "Beijing", and seat "second class seat"; Update intent parameters: DATE is updated to "the day after tomorrow", and TIME = "afternoon" is added.
[0114] S203, third round, user input: "Never mind, I won't order."
[0115] System processing: The system extracted no valid entities, detected cancellation terms "never mind" and "I won't order anymore", and there were no new intent keywords. It was determined to be a cancellation type, the current task was terminated, and active intents were cleared.
[0116] S204, fourth round, user input: "Let's check the weather in Chengdu."
[0117] System processing: No active intent was found in the acquired context; ambiguity arose in the initial multi-strategy fusion selection: the initial confidence of the weather query intent (QUERY_WEATHER) was 0.89 and the initial confidence of the traffic query intent (QUERY_TRAFFIC) was 0.87; Entity extraction: LOCATION="Chengdu" (0.98); Bidirectional matching: S_forward=0.95, S_reverse=0.92 for QUERY_WEATHER; S_forward=0.65, S_reverse=0.58 for QUERY_TRAFFIC; Overall confidence of the weather query intent (QUERY_WEATHER) = 0.89×0.3 + 0.95×0.4 + 0.9 2×0.3=0.267+0.38+0.276=0.923; The overall confidence of the traffic query intent QUERY_TRAFFIC = 0.87×0.3+0.65×0.4+0.58×0.3=0.261+0.26+0.174=0.695; Intent disambiguation: Calculate threshold_dynamic=0.15×(1+0.05×4) / (1+2×0.15)=0.15×1.2 / 1.3≈0.138, the difference between the two is 0.228>0.138, no clarification is needed, directly select the weather query intent (QUERY_WEATHER); Dynamic filling: DATE is filled with "today" according to the current time; Output the intent and trigger the weather query.
[0118] This example fully demonstrates the invention's ability to accurately determine intent transfer types, intelligently fill in dynamic default values, and effectively handle ambiguous intents in multi-turn dialogues.
[0119] According to an embodiment of the present invention, an intelligent interactive intent recognition method based on fusion of multi-strategy and bidirectional entity matching, compared with traditional single intent recognition or entity extraction methods, combines the advantages of rules, templates, deep learning, and vector matching to overcome the shortcomings of single methods. The present invention receives user voice or text input; performs standardized preprocessing on the input and, combined with the conversation context, generates candidate intents using a fusion of multi-strategy mechanisms; extracts entity information from the text using general named entity recognition; re-ranks the candidate intents based on bidirectional entity matching scores, whereby the bidirectional matching scores include forward matching scores based on entity coverage and matching degree, and reverse matching scores based on entity specificity and relevance; determines the currently active intent based on the context and analyzes intent transition types and intent disambiguation processing; outputs the final recognized intent and triggers the execution of corresponding tasks. The present invention effectively solves the problems of intent ambiguity, entity missingness, and ambiguous intents in multi-turn dialogues, improving the accuracy of intent recognition.
[0120] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.
Claims
1. A method for intelligent interactive intent recognition based on the fusion of multi-strategy and bidirectional entity matching, characterized in that, The method includes: Receive user's voice or text input, and perform speech-to-text processing on the voice input to obtain a unified text input; The text input is standardized and preprocessed, and combined with the conversation context, a multi-strategy mechanism of vector matching, template matching, deep learning model classification and rule matching is integrated, and candidate intents and their initial confidence are generated based on the weight coefficients of each strategy. Perform generic named entity recognition on the standardized preprocessed text to extract entity information; The candidate intents are reordered using the entity bidirectional matching score, which includes a positive matching score based on entity coverage and matching degree and a negative matching score based on entity specificity and relevance. Determine the current active intent by combining the context and analyze the intent shift type and intent disambiguation processing; The final identified intent is output and the corresponding task is triggered.
2. The method according to claim 1, characterized in that, The process of standardizing and preprocessing the text input, and combining it with the session context, integrates a multi-strategy mechanism of vector matching, template matching, deep learning model classification, and rule matching, and generates candidate intents and their initial confidence scores based on the weight coefficients corresponding to each strategy, specifically includes: The text input is cleaned and preprocessed by word segmentation, and the processing results of the two modalities are unified into standardized interactive text. The current session's dialogue context is obtained based on the session ID. The dialogue context includes the current intent ID, collected parameters, context entities, interaction history, and modal input records. A multi-strategy mechanism is used to perform preliminary intent recognition on the standardized interactive text, generating multiple candidate intents and their corresponding initial confidence scores. The multi-strategy mechanism specifically includes four strategies: vector matching, template matching, deep learning model classification, and rule matching. Each strategy is configured with dynamically adjustable weight coefficients. For each candidate intent, the initial confidence score obtained under each strategy is multiplied by the corresponding weight coefficient, and the confidence scores of the same intent are weighted and averaged to obtain the initial confidence score S_initial after fusion of the candidate intents. The calculation formula is as follows: S_initial=∑(conf_initial_i×wc_i); In the formula, conf_initial_i is the i-th initial confidence level, wc_i is the weight coefficient corresponding to the strategy, and ∑wc_i=1.
3. The method according to claim 1, characterized in that, The step of performing general named entity recognition on the standardized preprocessed text to extract entity information specifically includes: Perform generic named entity recognition on the standardized preprocessed text to extract one or more entities, each entity containing entity type, entity value, confidence level and source.
4. The method according to claim 1, characterized in that, The positive matching score based on entity coverage and matching degree specifically includes: The extracted entity information is forward matched with the preset entity parameters of each candidate intent, and the forward matching score S_forward is calculated using the following formula: S_forward=(N_matched / N_required)×(Σ(conf_entity_i×w_i) / N_matched); In the formula, N_matched is the number of required entities for a successful match, N_required is the total number of required entities required by the intent, conf_entity_i is the confidence level of the i-th matched entity, w_i is the preset importance weight coefficient of the entity in the intent, and Σw_i=1.
5. The method according to claim 1, characterized in that, The reverse matching score based on entity specificity and relevance specifically includes: Based on the extracted entity information, a matching intent set is found in reverse, and the reverse matching score S_reverse is calculated. The calculation formula is as follows: S_reverse=(Σexclusivity(e_j,intent_k)×relevance(e_j,intent_k)) / M; In the formula, M is the total number of entities extracted, exclusivity(e_j, intent_k) is the exclusivity of entity e_j to intent_k, and relevance(e_j, intent_k) is the relevance between entity e_j and intent_k.
6. The method according to claim 1, characterized in that, The reordering of the candidate intentions specifically includes: The comprehensive confidence S_comprehensive is calculated using the initial confidence S_initial, forward matching score S_forward, and reverse matching score S_reverse after candidate intent fusion. The comprehensive confidence S_comprehensive = α × S_initial + β × S_forward + γ × S_reverse is calculated, where α is a preset initial confidence weight coefficient, β is a preset forward verification score weight coefficient, and γ is a preset reverse verification score weight coefficient, and α + β + γ = 1. The candidate intents are then reordered in descending order based on the S_comprehensive.
7. The method according to claim 1, characterized in that, The process of determining the current active intent based on context, analyzing intent shift types, and handling intent disambiguation specifically includes: Based on the context information, it is determined whether there is an active intent in the current session, and the user intent transfer type is analyzed based on the new input content. The intent transfer type includes continue, repair, refine, switch, or cancel. When the difference between the highest overall confidence score and the second highest overall confidence score after reordering is less than a preset ambiguity threshold, a clarification question is generated, and at least two high-confidence candidate intents are displayed to the user for confirmation. The preset ambiguity threshold is dynamically adjusted according to the dialogue round and the historical ambiguity rate.
8. The method according to claim 1, characterized in that, The output ultimately identifies the intent and triggers the execution of the corresponding task, specifically including: Before determining the final identification intent, check whether the necessary entity parameters are complete, and fill in the missing parameters statically or dynamically according to the preset default value type; output the final identification intent and the corresponding entity parameters, and trigger the corresponding dialogue management or task execution process.