Intention discrimination model training method and apparatus, and computer device

By screening and improving the intent discrimination model in the financial services field, and using role-playing models to generate diverse counterfactual dialogues, the problems of data scarcity and insufficient coverage of long-tail scenarios were solved, thereby improving the performance and robustness of the model.

CN122021663BActive Publication Date: 2026-06-19MJOYS COM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MJOYS COM
Filing Date
2026-04-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the financial services sector, existing technologies struggle to cover long-tail and extreme scenarios in their discriminative models, and lack optimization mechanisms to address model weaknesses, resulting in insufficient data diversity and inadequate model performance.

Method used

By screening high-quality dialogue samples, extracting key variables to construct an instruction fine-tuning dataset, improving the model architecture to support counterfactual dialogue generation, using role-playing models to simulate diverse dialogue paths, and mining low-confidence samples and boundary confusion samples through dynamic game mechanisms, a dynamic game relationship is established between the role model and the intent discrimination model.

Benefits of technology

It significantly improves the performance and reliability of the intent discrimination model in practical applications, systematically expands the rare scenario and intent boundary samples, and enhances the model's ability to handle various situations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122021663B_ABST
    Figure CN122021663B_ABST
Patent Text Reader

Abstract

This invention discloses a training method, apparatus, and computer equipment for an intent discrimination model. The method includes: screening dialogue samples and extracting key variables to construct an instruction fine-tuning dataset; improving the model architecture and training the improved model architecture to obtain a role-playing model; generating counterfactual dialogues based on the modified key variables using the role-playing model to obtain dialogue content; inputting the dialogue content into the intent discrimination model, and establishing a dynamic game relationship between the role model and the intent discrimination model by mining low-confidence samples and boundary confusion samples, and feeding back the variant data of the mined samples into the training process of the intent discrimination model. By implementing the method of this invention, more diverse and challenging training data can be generated through a dynamic game mechanism and intervention in key variables, systematically expanding rare scenarios and intent boundary samples, and significantly improving the performance and reliability of the intent discrimination model in practical applications.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to artificial intelligence, and more specifically to methods, apparatus and computer equipment for training intention discrimination models. Background Technology

[0002] In modern intelligent outbound calling systems, accurately identifying user intent is crucial for improving service quality and efficiency. Current mainstream methods rely on neural network-based language models or discriminative models to achieve this. However, in financial services sectors such as banking, the amount of data available for training these models is extremely limited due to the privacy concerns and strict restrictions on customer call data. To address this challenge, using large language models to generate synthetic dialogue data is considered a viable solution. By having large language models play specific roles to simulate real-world dialogues, realistic multi-turn human-computer interaction samples can be generated.

[0003] While this approach demonstrates significant potential, existing technologies still face two major challenges: first, they struggle to cover long-tail and extreme scenarios; and second, they lack optimization mechanisms to address model weaknesses. Specifically, typical generative models tend to reproduce common dialogue patterns while neglecting rare situations crucial to business operations, such as complex complaints or extreme customer emotions. Furthermore, the one-way generation process cannot adjust model performance based on intent, meaning that for "simple samples" that the model can already identify well, it's difficult to supplement with the most valuable challenging samples for model improvement.

[0004] Therefore, it is necessary to design a new method to generate more diverse and challenging training data through dynamic game mechanisms and intervention in key variables. In particular, by "intervening" in key causal variables in historical dialogues, counterfactual dialogue paths of "what if" can be generated to systematically expand rare scenarios and intent boundary samples, significantly improving the performance and reliability of intent discrimination models in practical applications. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide an intention discrimination model training method, apparatus and computer equipment.

[0006] To achieve the above objectives, the present invention adopts the following technical solution: an intent discrimination model training method, comprising:

[0007] Filter dialogue samples and extract key variables to construct an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain a role-playing model.

[0008] Counterfactual dialogue generation is performed using a role-playing model based on the modified key variables to obtain dialogue content;

[0009] The dialogue content is input into the intent discrimination model, and low-confidence samples and boundary confusion samples are mined. The variant data of the mined samples are fed back into the training process of the intent discrimination model to establish a dynamic game relationship between the role model and the intent discrimination model.

[0010] Its further technical solution is as follows: The process of screening dialogue samples and extracting key variables to construct an instruction fine-tuning dataset, improving the model architecture to support the generation of different dialogue paths based on externally specified variables, and training the improved model architecture based on the dataset to obtain a role-playing model includes:

[0011] Select high-quality samples from the anonymized historical dialogue texts that are free from intent judgment errors, speech recognition errors, or logical inconsistencies.

[0012] Key variables affecting the dialogue flow are labeled in the high-quality samples;

[0013] The key variables are converted into role control prompts, and the user's speech content is completed based on the role control prompts using a general LLM and then verified to obtain a fine-tuned dataset.

[0014] Based on the Transformer Decoder architecture, a controllable variable embedding layer independent of text encoding is added to obtain an improved model architecture;

[0015] The improved model architecture is then supervised and fine-tuned using the fine-tuning dataset, enabling it to generate corresponding responses based on the input role control prompts, thus obtaining a role-playing model.

[0016] The further technical solution is as follows: the key variables include user attributes and business scenario characteristics.

[0017] The further technical solution is as follows: the key variables are converted into role control prompts, and a general LLM is used to complete the user's speech based on the role control prompts, and the results are verified to obtain a fine-tuned dataset, including:

[0018] The key variables are transformed into structured role control prompts;

[0019] The robot's speech is retained, and the user's speech is completed using a general LLM based on the provided role control prompts to obtain the user's speech content;

[0020] The user's comments are validated to obtain a fine-tuned dataset.

[0021] Its further technical solution is as follows: the counterfactual dialogue generation based on the modified key variables using a role-playing model to obtain dialogue content includes:

[0022] Modify the actual values ​​of the key variables and design new character control prompts;

[0023] By using a role-playing model combined with the role control prompts, a dialogue between customer service and user is simulated, and a virtual response that conforms to the intervention settings is generated based on real values ​​to obtain the dialogue content.

[0024] Its further technical solution is as follows: The role-playing model, combined with the role control prompts, simulates a dialogue between customer service and the user, generating a virtual response that conforms to the intervention settings based on real values ​​to obtain the dialogue content, including:

[0025] When the robot initiates a conversation, the role-playing model generates a corresponding user response based on the actual values, the robot's speech, and the role control prompts to obtain the conversation content.

[0026] The further technical solution is as follows: The dialogue content is input into the intent discrimination model, and low-confidence samples and boundary confusion samples are mined. Variant data of the mined samples are then fed back into the training process of the intent discrimination model to establish a dynamic game relationship between the role model and the intent discrimination model, including:

[0027] The dialogue content is input into the intent discrimination model so that the intent discrimination model can evaluate the classification result and confidence level of each dialogue in real time;

[0028] Based on the output of the intent discrimination model, samples with confidence levels that meet the requirements and boundary confusion samples are selected as seed data to obtain difficult samples;

[0029] The difficult samples are fed back into the role model to generate variant data;

[0030] After the difficult samples and variant data are validated by the model, they are added to the training set. The intention discrimination model is then trained again using the training set to establish a dynamic game relationship between the role model and the intention discrimination model.

[0031] The present invention also provides an intent discrimination model training apparatus, comprising:

[0032] The role-playing model training unit is used to filter dialogue samples and extract key variables to build an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain the role-playing model.

[0033] The dialogue content generation unit is used to generate counterfactual dialogues based on the modified key variables using a role-playing model, in order to obtain dialogue content.

[0034] The iterative optimization unit is used to input the dialogue content into the intent discrimination model, and to establish a dynamic game relationship between the role model and the intent discrimination model by mining low-confidence samples and boundary confusion samples, and feeding back the variant data of the mined samples into the training process of the intent discrimination model.

[0035] Its further technical solution is: the role-playing model training unit includes:

[0036] The selection sub-unit is used to select high-quality samples from the anonymized historical dialogue text that are free from intent judgment errors, speech recognition errors, or logical inconsistencies.

[0037] Annotation subunits are used to annotate key variables that affect the dialogue flow in the high-quality samples.

[0038] The dataset consists of sub-units used to convert the key variables into role control prompts, and to use a general LLM to complete the user's speech based on the role control prompts, and to perform verification in order to obtain a fine-tuned dataset.

[0039] An improved subunit is used to add a controllable variable embedding layer independent of text encoding based on the Transformer Decoder architecture, so as to obtain an improved model architecture;

[0040] The fine-tuning training subunit is used to perform supervised fine-tuning of the improved model architecture using the fine-tuning dataset, so that the improved model architecture generates corresponding response content based on the input role control prompt words, thereby obtaining the role-playing model.

[0041] The present invention also provides a computer device, the computer device including a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the above-described method.

[0042] The advantages of this invention compared to existing technologies are as follows: This invention achieves the goal of generating more diverse and challenging training data through a dynamic game mechanism and intervention in key variables. Specifically, it first filters dialogue samples and extracts key variables to construct an instruction fine-tuning dataset, improving the model architecture to support the generation of different dialogue paths based on externally specified variables, thereby training a role-playing model. Then, based on the modified key variables, the role-playing model is used to generate counterfactual dialogues, simulating "what if" scenarios, particularly expanding samples for rare scenarios and intent boundaries. Finally, the generated dialogue content is input into the intent discrimination model. By mining low-confidence samples and boundary-confused samples, and feeding the variations of these samples back into the model training process, a dynamic game relationship is established between the role model and the intent discrimination model. This allows the model to systematically learn a wider range of more complex situations, significantly improving its performance and reliability in practical applications. This method effectively solves the problems of scarce real business data and insufficient coverage of long-tail scenarios, improving the model's ability to handle various situations.

[0043] The present invention will be further described below with reference to the accompanying drawings and specific embodiments. Attached Figure Description

[0044] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0045] Figure 1 A flowchart illustrating the intent discrimination model training method provided in an embodiment of the present invention;

[0046] Figure 2 A schematic diagram of a sub-process of the intent discrimination model training method provided in this embodiment of the invention. Figure 1 ;

[0047] Figure 3 A schematic diagram of a sub-process of the intent discrimination model training method provided in this embodiment of the invention. Figure 2 ;

[0048] Figure 4 A schematic diagram of a sub-process of the intent discrimination model training method provided in this embodiment of the invention. Figure 3 ;

[0049] Figure 5 A schematic diagram of a sub-process of the intent discrimination model training method provided in this embodiment of the invention. Figure 4 ;

[0050] Figure 6A schematic block diagram of an intent discrimination model training device provided in an embodiment of the present invention;

[0051] Figure 7 A schematic block diagram of a computer device provided for an embodiment of the present invention. Detailed Implementation

[0052] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0053] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0054] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0055] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0056] Please see Figure 1 , Figure 1This is a flowchart illustrating the intent discrimination model training method provided in this embodiment of the invention. This method is applied to a server. Through a dynamic game mechanism and intervention in key variables, it achieves the goal of generating more diverse and challenging training data. Specifically, firstly, high-quality samples are selected from de-identified historical dialogues, and key variables such as user attributes and business scenario features are extracted to construct an instruction fine-tuning dataset. Then, the model architecture is improved (by adding a controllable variable embedding layer based on the Transformer Decoder) to support the generation of different dialogue paths based on externally specified variables, thereby training a role-playing model. Next, new role control prompts are designed by modifying the true values ​​of key variables. Counterfactual dialogues are generated using the role-playing model to simulate interactions between customer service representatives and users in different contexts, producing virtual responses that conform to the intervention settings, resulting in diverse dialogue content. This dialogue content is input into the intent discrimination model. By mining low-confidence samples and boundary confusion samples, and feeding these difficult samples and their variations back into the role model, more diverse training data is generated. Finally, this data is added to the training set to retrain the intent discrimination model, establishing a dynamic game relationship between the role model and the intent discrimination model. This method can not only systematically expand the number of samples for rare scenarios and intent boundaries, but also significantly improve the performance and reliability of intent discrimination models in practical applications.

[0057] Figure 1 This is a flowchart illustrating the intent discrimination model training method provided in an embodiment of the present invention. Figure 1 As shown, the method includes the following steps S110 to S130.

[0058] S110. Filter dialogue samples and extract key variables to construct an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain a role-playing model.

[0059] In this embodiment, the role-playing model refers to a generative language model that accepts intervention from causal variables and generates diverse and counterfactual dialogue data based on specific role-controlled prompts, aiming to simulate the responses of real users to enrich the training samples.

[0060] In one embodiment, please refer to Figure 2 The above-mentioned step S110 may include steps S111 to S115.

[0061] S111. Select high-quality samples from the anonymized historical dialogue texts that do not contain errors in intent judgment, speech recognition, or logical inconsistencies.

[0062] First, high-quality samples without intent judgment errors, speech recognition errors, or logical inconsistencies are selected from the anonymized historical dialogue text. This step ensures the quality of the data used for subsequent processing and avoids affecting the accuracy of the model due to errors in historical data.

[0063] In this embodiment, the quality screening criteria are as follows:

[0064] Intent judgment error: Exclude samples that were misclassified in the original interaction.

[0065] Speech Recognition (ASR) Errors: Exclude dialogues that are misunderstood due to inaccurate speech recognition.

[0066] Logical incoherence: Exclude dialogue segments that have obvious logical problems or are incoherent.

[0067] S112. Label the key variables that affect the dialogue process in the high-quality samples; the key variables include user attributes and business scenario characteristics.

[0068] In this embodiment, key variables that may affect the dialogue flow are labeled in these high-quality samples. These variables may include user state variables (such as age, mood, cooperation level, etc.) and business scenario variables (such as processing intent, business type, etc.). Labeling these variables is to enable subsequent control over the generated dialogue content.

[0069] Specifically, variable types:

[0070] User state variables, such as age, mood, and cooperation level, can influence the direction and style of the conversation.

[0071] Business scenario variables: such as the intent to apply (explicit / vague) and the type of business (financial management / loan, etc.). These variables help simulate different service scenarios.

[0072] Annotation methods: Annotation can be done through manual review or automatically using natural language processing tools. For complex variables, a combination of manual review and annotation may be necessary to ensure accuracy.

[0073] S113. The key variables are converted into role control prompts, and the user's speech content is completed based on the role control prompts using a general LLM, and then verified to obtain a fine-tuned dataset.

[0074] In this embodiment, the key variables annotated above are transformed into structured role control prompts, and a general large-scale language model (LLM) is used to complete the user's speech based on these prompts. This process also includes manual verification of the automatically completed data to improve its logical consistency and naturalness of language, ultimately forming a fine-tuned dataset.

[0075] In one embodiment, please refer to Figure 3 The above step S113 may include steps S1131 to S1133.

[0076] S1131. Convert the key variables into structured role control prompts.

[0077] In this embodiment, the goal of this step is to convert key variables extracted from historical dialogue samples into a format that the model can understand and process. Each key variable represents a specific role attribute or situational condition, such as user state (emotion, cooperation level, etc.) and business scenario (intention to process, business type, etc.). These variables need to be encoded into a machine-readable form, i.e., role control prompts. This process typically includes the following sub-steps:

[0078] Variable identification: Identify which variables are necessary to guide dialogue generation.

[0079] Encoding conversion: Convert the identified variable values ​​into text descriptions or numerical labels for subsequent processing.

[0080] Cue word construction: Based on the above information, construct a cue word template containing all necessary variables. For example, "Suppose you are communicating with a customer who is [age] years old, whose emotional state is [emotional state], and who has a need for [business type]."

[0081] S1132. Retain the robot's speech and use a general LLM to complete the user's speech based on the provided role control prompts to obtain the user's speech content.

[0082] In this embodiment, at this stage, we use a generalized language model (LLM) to simulate the user's response. This process involves:

[0083] Preparing input: Combining the previously constructed character control prompts with the known robot speech content, a complete input context is formed.

[0084] Response Generation: Leveraging the capabilities of general-purpose LLMs, this function automatically generates potential user responses based on the provided context. It's important to emphasize that the LLM should be able to understand and consider various conditions within the role-control prompts to generate natural language responses that align with the role's established characteristics.

[0085] Diversification: To enrich the dataset, LLM can generate multiple different versions of user statements for the same input context by adjusting the random seed or parameter settings.

[0086] S1133. Verify the user's speech content to obtain a fine-tuning dataset.

[0087] The final step is to perform a quality check on the generated user comments to ensure their logical coherence and linguistic fluency. This step is crucial because it directly impacts the quality of the final fine-tuned dataset. Specifically, this includes:

[0088] Consistency check: Verifies whether the user's statements are consistent with the given role control prompts, ensuring that there are no logical contradictions.

[0089] Naturalness assessment: The naturalness of the generated text is checked by manual review or by using automated assessment tools to ensure that it reads like a real conversation.

[0090] Bug fixes: Make necessary modifications to the identified issues, and rerun the generation process if necessary.

[0091] Integration into the dataset: The high-quality user comments that have undergone the above checks and corrections will be integrated into the fine-tuning dataset for subsequent model training and optimization.

[0092] By following these three steps, valuable information can be effectively extracted from historical dialogue data and transformed into a fine-tuning dataset that helps improve the performance of the dialogue system.

[0093] Specifically, the key variables marked above are transformed into structured role control prompts, including basic commands and variable names and their values.

[0094] Use generic LLM to complete user-side responses based on retained bot-side statements and converted prompts.

[0095] The generated data is checked for logical consistency and natural language, and manual corrections are made when necessary to improve quality.

[0096] S114. Based on the Transformer Decoder architecture, add a controllable variable embedding layer independent of text encoding to obtain an improved model architecture.

[0097] In this embodiment, a controllable variable embedding layer independent of text encoding is added to the Transformer Decoder architecture, thereby improving the model architecture. This design allows the model to generate corresponding response content based on the input role-controlled prompt words, enhancing its ability to generate diverse dialogues according to different contexts.

[0098] Specifically, based on the Transformer Decoder architecture, an independent, controllable variable embedding layer is added.

[0099] The controllable variables are first encoded by a tokenizer, then obtained as vectors through an independent embedding layer, and finally obtained as an overall representation through average pooling.

[0100] The original word vectors of the input token and the vectors of controllable variables can be fused by adding them bit by bit or by concatenating them and then projecting them linearly.

[0101] S115. The improved model architecture is supervisedly fine-tuned using the fine-tuning dataset, so that the improved model architecture generates corresponding response content based on the input role control prompt words, thereby obtaining the role-playing model.

[0102] Finally, the improved model architecture was supervisedly fine-tuned using the constructed fine-tuning dataset. In this way, the model learned how to generate dialogue responses that meet specific conditions based on the input role control prompts, ultimately forming a fully functional role-playing model.

[0103] Specifically, the improved model is subjected to supervised fine-tuning through multiple iterations using the fine-tuning dataset constructed in S113.

[0104] The goal is to enable the model to generate responses that meet specific conditions based on input role-control prompts, while maintaining a natural and consistent speaking style.

[0105] Regularly evaluate model performance, focusing on the quality, diversity, and consistency of generated dialogues. Adjust training strategies or model parameters based on the evaluation results.

[0106] By following these detailed steps, we not only enhance the model's ability to understand and generate diverse dialogues, but also strengthen its flexibility in adapting to different business scenarios and service needs. This meticulous approach helps build a more intelligent and practical role-playing model.

[0107] This process not only improves the quality and diversity of model-generated dialogues, but also enhances the model's ability to understand and simulate real-world dialogue scenarios. In particular, for high-risk, low-frequency long-tail business scenarios, the causal counterfactual inference method can proactively create diverse training data, thereby improving the overall performance and robustness of the model.

[0108] S120. Based on the modified key variables, a role-playing model is used to generate counterfactual dialogue to obtain the dialogue content.

[0109] In this embodiment, the dialogue content refers to the interactive dialogue between a virtual user and a customer service representative, generated through a role-playing model based on modified key variables and conforming to specific context settings (such as different emotions or business types).

[0110] In this embodiment, step S120 details how to generate counterfactual dialogues by adjusting key variables and using a role-playing model. This process aims to explore and simulate user responses in different contexts, thereby enriching the training dataset and improving the accuracy and robustness of the intent discrimination model.

[0111] In one embodiment, please refer to Figure 4 The above-mentioned step S120 may include steps S121 to S122.

[0112] S121. Modify the actual value of the key variable and design new character control prompts.

[0113] This step requires intervention or modification of key causal variables in historical dialogues (such as user sentiment, business type, etc.) to create new scenarios or situations. Specific operations include:

[0114] Variable selection: Identify which key variables need to be modified. These variables should be factors that can significantly influence the direction of the conversation, such as the user's emotional state or the business context.

[0115] Variable modification: Modify the true values ​​of selected variables according to specific research objectives or experimental needs. For example, change "Emotion: Calm" to "Emotion: Excited", or change "Business Type: Financial Consulting" to "Business Type: Loan Complaint".

[0116] Cue Reconstruction: Based on the modified variable values, reconstruct the role-playing control cue words. These cue words should include all necessary contextual information and the modified variable values ​​so that the role-playing model can understand and generate appropriate dialogue.

[0117] S122. Using a role-playing model combined with the role control prompts, simulate the dialogue between customer service and users, and generate virtual responses that conform to the intervention settings based on real values ​​to obtain the dialogue content.

[0118] Specifically, when the robot initiates a dialogue, the role-playing model generates a corresponding user response based on the actual values, the robot's speech, and the role control prompts to obtain the dialogue content.

[0119] Next, a role-playing model will be used to simulate the interaction between customer service representatives and users. Based on the previously designed role control prompts, dialogue content will be generated to fit the new scenario. The specific process is as follows:

[0120] Initial dialogue: The robot initiates the dialogue, which can be a general greeting, an authentication request, or an opening statement introducing the product.

[0121] User-side response generation: Based on the background information provided by the robot's statements and role control prompts, the role-playing model automatically generates the user's response. It is emphasized here that the generated response should reflect the modified contextual characteristics (such as more emotionally charged expressions).

[0122] Multi-turn interaction simulation: Continue the above steps until a complete dialogue or multi-turn dialogue sequence is completed. Each round of dialogue must ensure that the role-playing model can accurately capture the current dialogue state and make logical, reasonable, and natural responses.

[0123] Dialogue content output: The final output is a series of virtual dialogues generated based on the modified key variables. These dialogues not only help to expand the coverage of the training dataset, especially for some rare but important scenarios, but also provide valuable resources for subsequent model training.

[0124] By performing the two steps described above—modifying key variables to design new role-control prompts and simulating dialogue between customer service representatives and users using a role-playing model—we can effectively generate diverse counterfactual dialogue data. This approach not only helps us better understand and handle complex dialogue scenarios but also provides a solid foundation for improving the performance of intent discrimination models.

[0125] S130. Input the dialogue content into the intent discrimination model, and by mining low-confidence samples and boundary confusion samples, and feeding the variant data of the mined samples back into the training process of the intent discrimination model, a dynamic game relationship between the role model and the intent discrimination model is established.

[0126] In one embodiment, please refer to Figure 5 The above-mentioned step S130 may include steps S131 to S134.

[0127] S131. Input the dialogue content into the intent discrimination model so that the intent discrimination model can evaluate the classification result and confidence level of each dialogue in real time.

[0128] First, the dialogue content generated between the role-playing model and the intent discrimination model is input into the intent discrimination model. During this process, the intent discrimination model analyzes each dialogue in real time, providing a classification result (i.e., user intent) and a confidence score for that classification. This step is crucial for identifying which samples the intent discrimination model struggles to classify accurately.

[0129] S132. Based on the output of the intent discrimination model, select samples with confidence levels that meet the requirements and boundary confusion samples as seed data to obtain difficult samples.

[0130] Based on the classification results and confidence scores obtained in the previous step, two particularly important types of samples are selected from all dialogue samples as seed data:

[0131] Low confidence samples: Although there are classification results, the confidence is below a set threshold (e.g., 0.5), indicating that the intention discrimination model is confused when dealing with these samples.

[0132] Boundary confusion samples: When the intent discrimination model predicts probabilities that are very close between two or more similar intents, such samples are considered boundary confusion samples.

[0133] These two types of samples are collectively referred to as "difficult samples," which represent the weaknesses and blind spots of the current intent discrimination model.

[0134] S133. Feed the difficult samples back to the role model to generate variant data.

[0135] Next, the role-playing model is used to generate variants of the selected difficult samples. By modifying key variables (such as emotion, business type, etc.), the role-playing model can generate a series of new dialogue scenarios. These variants not only increase the diversity of the training data, but also specifically target the weak areas of the intent discrimination model, providing high-quality data support for subsequent training.

[0136] S134. After the difficult samples and variant data have been validated by the model, they are added to the training set. The intention discrimination model is then trained again using the training set to establish a dynamic game relationship between the role model and the intention discrimination model.

[0137] Finally, difficult samples and their variants, validated manually or by advanced models, are added to the training set to retrain the intent discrimination model. This helps strengthen the model's ability to handle low-confidence samples and samples with confusing boundaries, thereby gradually clarifying the classification boundaries and improving the model's overall robustness and accuracy. Through multiple rounds of "generation-mining-retraining" cycles, a dynamic game is formed between the role-playing model and the intent discrimination model, with both promoting each other and co-evolving, ultimately achieving a significant performance improvement.

[0138] This method not only effectively solves the problem of the lack of specificity in traditional random generation methods, but also establishes a closed-loop mechanism from discriminative feedback to targeted generation and then to model optimization, which greatly improves training efficiency and model quality.

[0139] In this embodiment, the method utilizes difficult samples at the classification boundary to "attack" the intent discrimination model. The errors of the intent discrimination model serve as feedback signals to promote the updating of the role model, thus creating a dynamic game between the two and achieving bidirectional reinforcement. By "intervening" in key causal variables (such as user emotion and business type) in historical dialogues, counterfactual dialogue paths of "what if..." are created, systematically expanding the number of rare scenarios and intent boundary samples. This method can generate diverse and highly challenging training data, effectively improving the accuracy and robustness of the intent discrimination model in actual outbound calling tasks.

[0140] The role-playing model, acting as a generator, is capable of accepting interventions from causal variables and is used to generate diverse and counterfactual dialogue data. The intent discrimination model, acting as a discriminator, is the core natural language understanding component of the outbound calling system, used to identify user intent and provide feedback on the confidence level of the identification.

[0141] The main objective of this embodiment is to cover long-tail scenarios by generating virtual "counterfactual" data through causal intervention, and to strengthen the model's boundaries by mining the blind spots (i.e., difficult samples) of the intent model and generating targeted training data. First, high-quality samples are selected from anonymized historical human-computer dialogue texts, as these records may contain noise such as intent judgment errors, ASR recognition errors, or logical inconsistencies. Key variables affecting the dialogue's direction are extracted or labeled, including user state variables (such as age, identity, and emotion) and business scenario variables (such as processing intent and business type). The extracted "controllable variables" are transformed into structured role control prompts, and a general LLM is used to complete the dialogue based on different prompts. Controllable variable embeddings independent of text token embeddings are introduced to enhance the role model's ability to generate different dialogue paths based on different externally specified controllable variables. The LLM is then fine-tuned under supervision using the constructed dataset, enabling the trained model to generate responses that meet specific conditions based on the input control variables.

[0142] Next, a counterfactual generation strategy is employed to explore the boundaries of the dialogue and address the issue of insufficient data diversity. The true values ​​of controllable variables are modified, and new role-control prompts are generated based on these modifications. Using the modified variables, the role model acts as a customer service representative, engaging in dialogue with the chatbot and generating counterfactual virtual responses.

[0143] Finally, by identifying weaknesses in the intent model to guide data generation, a dynamic game relationship is established between the role-playing model and the intent discrimination model. Low-confidence samples and boundary-confusion samples are selected as seed data based on the discrimination results. Difficult samples are fed back to the role-playing model to generate more variations, and after manual or advanced model validation, they are added to the training set for targeted retraining of the intent discrimination model.

[0144] Through multiple iterations, the classification boundaries of the intention discrimination model become clearer, and its robustness is significantly improved. This closed-loop mechanism not only improves training efficiency and model quality, but also achieves a low-cost, privacy-risk-free, high-fidelity training ecosystem.

[0145] The method in this embodiment aims to construct counterfactual dialogue paths based on "what if..." by intervening in key variables (such as emotion, user profile, etc.) in the dialogue history. This method differs fundamentally from traditional random generation methods. It enables the system to proactively create high-risk, low-frequency long-tail business scenarios in the absence of real data, thereby significantly improving data completeness.

[0146] Specifically, key variables in the dialogue history (such as user emotions or specific user profiles) are selected for intervention. Based on these intervened variables, different dialogue scenarios are simulated, i.e., "what if..." scenarios. This allows the system to explore and generate edge cases that occur very rarely in reality but are crucial for model training, even without supporting real-world data.

[0147] Furthermore, the method in this embodiment also constructs a closed-loop optimization mechanism based on hard sample mining, changing the traditional practice of separating data generation and model training. This mechanism includes the following aspects:

[0148] Discriminant model feedback: The intent discriminant model is used to identify low-confidence regions and confusion boundaries as feedback signals.

[0149] Targeted generation of character models: Based on the feedback above, the character model can generate more challenging training samples in a targeted manner, maximizing the improvement of model performance with minimal data increment.

[0150] To ensure data security and compliance, while reducing data acquisition costs, the method in this embodiment achieves the following characteristics:

[0151] It does not rely on real user privacy data: All data used for training does not contain real user privacy information, but is created through a combination of controllable large model generation and manual verification.

[0152] Industrial-grade training data stream: This method uses synthetic data to provide quality comparable to manual annotation while maintaining data complexity and diversity, but at a lower cost and higher efficiency.

[0153] Currently, the method described in this embodiment has been deployed and validated in real-world systems, demonstrating particularly good performance in intelligent outbound call robot systems. By constructing a large-scale synthetic dataset and utilizing adversarial co-evolution and causal counterfactual generation mechanisms, continuous model iteration is achieved. Its main objective is to address the problems of scarce real business data, insufficient coverage of long-tail scenarios, and poor robustness of model recognition in sensitive fields such as finance.

[0154] In summary, the method presented in this embodiment not only improves the model's ability to handle long-tail scenarios but also achieves an efficient and economical data generation and model training process without infringing on user privacy. This innovation provides strong support for improving the accuracy and reliability of artificial intelligence systems in various application scenarios.

[0155] The aforementioned intention discrimination model training method, through a dynamic game mechanism and intervention in key variables, achieves the goal of generating more diverse and challenging training data. Specifically, it first filters dialogue samples and extracts key variables to construct an instruction fine-tuning dataset, improving the model architecture to support the generation of different dialogue paths based on externally specified variables, thereby training a role-playing model. Then, based on the modified key variables, the role-playing model is used to generate counterfactual dialogues, simulating "what if" scenarios, particularly expanding samples for rare scenarios and intention boundaries. Finally, the generated dialogue content is input into the intention discrimination model. By mining low-confidence samples and boundary-confused samples, and feeding the variations of these samples back into the model training process, a dynamic game relationship is established between the role model and the intention discrimination model. This allows the model to systematically learn a wider range of more complex situations, significantly improving its performance and reliability in practical applications. This method effectively solves the problems of scarce real-world business data and insufficient coverage of long-tail scenarios, improving the model's ability to handle various situations.

[0156] Figure 6 This is a schematic block diagram of an intent discrimination model training device 300 provided in an embodiment of the present invention. Figure 6 As shown, corresponding to the above-described intention discrimination model training method, the present invention also provides an intention discrimination model training apparatus 300. This intention discrimination model training apparatus 300 includes a unit for executing the above-described intention discrimination model training method, and the apparatus can be configured in a server. Specifically, please refer to... Figure 6 The intent discrimination model training device 300 includes a role-playing model training unit 301, a dialogue content generation unit 302, and an iterative optimization unit 303.

[0157] The role-playing model training unit 301 is used to filter dialogue samples and extract key variables to construct an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain the role-playing model; the dialogue content generation unit 302 is used to generate counterfactual dialogues based on the modified key variables using the role-playing model to obtain dialogue content; the iterative optimization unit 303 is used to input the dialogue content into the intent discrimination model, and to establish a dynamic game relationship between the role model and the intent discrimination model by mining low-confidence samples and boundary confusion samples, and feeding back the variant data of the mined samples into the training process of the intent discrimination model.

[0158] In one embodiment, the role-playing model training unit 301 includes:

[0159] The system comprises the following sub-units: a selection sub-unit, which selects high-quality samples from de-identified historical dialogue texts that are free from intent judgment errors, speech recognition errors, or logical inconsistencies; an annotation sub-unit, which annotates key variables affecting the dialogue flow in the high-quality samples; a dataset construction sub-unit, which converts the key variables into role control prompts and uses a general LLM to complete the user's speech based on the role control prompts, and performs validation to obtain a fine-tuned dataset; an improvement sub-unit, which adds a controllable variable embedding layer independent of text encoding based on the Transformer Decoder architecture to obtain an improved model architecture; and a fine-tuning training sub-unit, which uses the fine-tuned dataset to perform supervised fine-tuning on the improved model architecture, enabling the improved model architecture to generate corresponding response content based on the input role control prompts, thus obtaining a role-playing model.

[0160] In one embodiment, the dataset constituting subunit includes:

[0161] The transformation module is used to transform the key variables into structured role control prompts; the completion module is used to retain the robot's speech and use a general LLM to complete the user's speech based on the provided role control prompts to obtain the user's speech content; the verification module is used to verify the user's speech content to obtain a fine-tuned dataset.

[0162] In one embodiment, the dialogue content generation unit 302 includes:

[0163] The modification subunit is used to modify the actual values ​​of the key variables and design new role control prompts; the simulation subunit is used to simulate the dialogue between customer service and users using a role-playing model combined with the role control prompts, and generate virtual responses that conform to the intervention settings based on the actual values ​​to obtain the dialogue content.

[0164] In one embodiment, the simulation subunit is used to generate corresponding user responses from the role-playing model based on real values, robot speech, and role control prompts when the robot initiates a dialogue, so as to obtain the dialogue content.

[0165] In one embodiment, the iterative optimization unit 303 includes:

[0166] The evaluation subunit is used to input the dialogue content into the intent discrimination model, so that the intent discrimination model can evaluate the classification result and confidence of each dialogue in real time; the data filtering subunit is used to filter samples with confidence that meet the requirements and boundary confusion samples as seed data according to the output of the intent discrimination model to obtain difficult samples; the feedback subunit is used to feed the difficult samples back to the role model to generate variant data; the retraining subunit is used to add the difficult samples and variant data to the training set after model validation, and use the training set to retrain the intent discrimination model to establish a dynamic game relationship between the role model and the intent discrimination model.

[0167] It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above-mentioned intent discrimination model training device 300 and each unit can be referred to the corresponding description in the foregoing method embodiments. For the sake of convenience and brevity, it will not be repeated here.

[0168] The aforementioned intent discrimination model training device 300 can be implemented as a computer program, which can, for example... Figure 7 It runs on the computer device shown.

[0169] Please see Figure 7 , Figure 7 This is a schematic block diagram of a computer device provided in an embodiment of this application. The computer device 500 can be a server, wherein the server can be a standalone server or a server cluster composed of multiple servers.

[0170] See Figure 7 The computer device 500 includes a processor 502, a memory, and a network interface 505 connected via a system bus 501. The memory may include a non-volatile storage medium 503 and internal memory 504.

[0171] The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform an intent discrimination model training method.

[0172] The processor 502 provides computing and control capabilities to support the operation of the entire computer device 500.

[0173] The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute an intent discrimination model training method.

[0174] This network interface 505 is used for network communication with other devices. Those skilled in the art will understand that... Figure 7 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device 500 to which the present application is applied. The specific computer device 500 may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0175] The processor 502 is used to run the computer program 5032 stored in the memory to implement all the steps of the intent discrimination model training method.

[0176] It should be understood that in the embodiments of this application, the processor 502 may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0177] It will be understood by those skilled in the art that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program includes program instructions and can be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the process steps of the embodiments of the above methods.

[0178] Therefore, the present invention also provides a storage medium. This storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein when executed by a processor, the computer program causes the processor to perform all steps of the intent discrimination model training method.

[0179] The storage medium can be any computer-readable storage medium capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), magnetic disk, or optical disk.

[0180] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0181] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For example, the division of each unit is merely a logical functional division, and there may be other division methods in actual implementation. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.

[0182] The steps in the method of this invention can be adjusted, merged, or reduced in order according to actual needs. The units in the device of this invention can be merged, divided, or reduced according to actual needs. Furthermore, the functional units in the various embodiments of this invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0183] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.

[0184] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method of training an intent discrimination model, the method comprising: include: Filter dialogue samples and extract key variables to construct an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain a role-playing model. Counterfactual dialogue generation is performed using a role-playing model based on the modified key variables to obtain dialogue content; The dialogue content is input into the intent discrimination model, and low-confidence samples and boundary confusion samples are mined. The variant data of the mined samples are fed back into the training process of the intent discrimination model to establish a dynamic game relationship between the role model and the intent discrimination model.

2. The intention discrimination model training method according to claim 1, characterized in that, The process of selecting dialogue samples and extracting key variables to construct an instruction fine-tuning dataset, improving the model architecture to support the generation of different dialogue paths based on externally specified variables, and training the improved model architecture based on the dataset to obtain a role-playing model includes: Select high-quality samples from the anonymized historical dialogue texts that are free from intent judgment errors, speech recognition errors, or logical inconsistencies. Key variables affecting the dialogue flow are labeled in the high-quality samples; The key variables are converted into role control prompts, and the user's speech content is completed based on the role control prompts using a general LLM and then verified to obtain a fine-tuned dataset. Based on the Transformer Decoder architecture, a controllable variable embedding layer independent of text encoding is added to obtain an improved model architecture; The improved model architecture is then supervised and fine-tuned using the fine-tuning dataset, enabling it to generate corresponding responses based on the input role control prompts, thus obtaining a role-playing model.

3. The intention discrimination model training method according to claim 2, characterized in that, The key variables include user attributes and business scenario characteristics.

4. The intention discrimination model training method according to claim 1, characterized in that, The process involves converting the key variables into role control prompts, using a general LLM model to complete user speech based on these prompts, and then validating the results to obtain a fine-tuned dataset, including: The key variables are transformed into structured role control prompts; The robot's speech is retained, and the user's speech is completed using a general LLM based on the provided role control prompts to obtain the user's speech content; The user's comments are validated to obtain a fine-tuned dataset.

5. The intention discrimination model training method according to claim 1, characterized in that, The method of generating counterfactual dialogues using a role-playing model based on modified key variables to obtain dialogue content includes: Modify the actual values ​​of the key variables and design new character control prompts; By using a role-playing model combined with the role control prompts, a dialogue between customer service and user is simulated, and a virtual response that conforms to the intervention settings is generated based on real values ​​to obtain the dialogue content.

6. The intention discrimination model training method according to claim 5, characterized in that, The method of using a role-playing model combined with the role control prompts to simulate a dialogue between customer service and users, and generating virtual responses that conform to the intervention settings based on real values, results in dialogue content, including: When the robot initiates a conversation, the role-playing model generates a corresponding user response based on the actual values, the robot's speech, and the role control prompts to obtain the conversation content.

7. The intention discrimination model training method according to claim 1, characterized in that, The process of inputting the dialogue content into the intent discrimination model, mining low-confidence samples and boundary confusion samples, and feeding the variant data of the mined samples back into the training process of the intent discrimination model to establish a dynamic game relationship between the role model and the intent discrimination model includes: The dialogue content is input into the intent discrimination model so that the intent discrimination model can evaluate the classification result and confidence level of each dialogue in real time; Based on the output of the intent discrimination model, samples with confidence levels that meet the requirements and boundary confusion samples are selected as seed data to obtain difficult samples; The difficult samples are fed back into the role model to generate variant data; After the difficult samples and variant data are validated by the model, they are added to the training set. The intention discrimination model is then trained again using the training set to establish a dynamic game relationship between the role model and the intention discrimination model.

8. An intent-based model training device, characterized in that, include: The role-playing model training unit is used to filter dialogue samples and extract key variables to build an instruction fine-tuning dataset, improve the model architecture to support the generation of different dialogue paths based on externally specified variables, and train the improved model architecture based on the dataset to obtain the role-playing model. The dialogue content generation unit is used to generate counterfactual dialogues based on the modified key variables using a role-playing model, in order to obtain dialogue content. The iterative optimization unit is used to input the dialogue content into the intent discrimination model, and to establish a dynamic game relationship between the role model and the intent discrimination model by mining low-confidence samples and boundary confusion samples, and feeding back the variant data of the mined samples into the training process of the intent discrimination model.

9. The intention discrimination model training device according to claim 8, characterized in that, The role-playing model training unit includes: The selection sub-unit is used to select high-quality samples from the anonymized historical dialogue text that are free from intent judgment errors, speech recognition errors, or logical inconsistencies. Annotation subunits are used to annotate key variables that affect the dialogue flow in the high-quality samples. The dataset consists of sub-units used to convert the key variables into role control prompts, and to use a general LLM to complete the user's speech based on the role control prompts, and to perform verification in order to obtain a fine-tuned dataset. An improved subunit is used to add a controllable variable embedding layer independent of text encoding based on the Transformer Decoder architecture, so as to obtain an improved model architecture; The fine-tuning training subunit is used to perform supervised fine-tuning of the improved model architecture using the fine-tuning dataset, so that the improved model architecture generates corresponding response content based on the input role control prompt words, thereby obtaining the role-playing model.

10. A computer device, characterized in that, The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method as described in any one of claims 1 to 7.