Training sample generation and model training method and device, storage medium and product
By automating the processing of raw dialogue materials, breaking them down into question-answer pairs, and utilizing language models and external knowledge bases, the problems of logical conflicts and factual errors generated by large language models are solved, achieving efficient and low-cost hallucination detection and correction, and improving the detection accuracy and reliability of the model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU ANT KUAI TECHNOLOGY CO LTD
- Filing Date
- 2026-01-13
- Publication Date
- 2026-06-12
AI Technical Summary
Large language models are prone to logical conflicts or factual errors when generating content. Existing hallucination detection methods rely on manual annotation, which is costly and has a narrow coverage, making it difficult to adapt to diverse and dynamically changing application scenarios.
By automating the processing of raw dialogue materials, extracting target entities and breaking them down into question-answer pairs for logical consistency detection, generating training samples, training the hallucination detection model and the processing model, and using language models and external knowledge bases for automated recognition and correction.
It achieves efficient and low-cost generation of training samples, improves the accuracy and coverage of hallucination detection, reduces the burden of manual annotation, and improves the reliability and efficiency of model output.
Smart Images

Figure CN121525752B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of data synthesis technology, and more particularly to a training sample generation method, a training method for a hallucination detection model, a training method for a hallucination processing model, an electronic device, a computer-readable storage medium, and a computer program product. Background Technology
[0002] Large Language Models (LLMs) have demonstrated enormous application potential in numerous professional service scenarios, such as online customer service, intelligent question answering, and content generation. However, the inherent limitations of these models—the potential to generate seemingly reasonable but ultimately misleading content that contradicts established facts, internal knowledge, or common sense—pose challenges to their reliable deployment in real-world scenarios. Such inaccurate or misleading outputs not only harm user experience but, in fields with high rigor, can also trigger trust crises and operational risks.
[0003] Currently, the mainstream method for detecting illusions in model output relies on manual annotation. Specifically, experts are required to review, judge, and annotate the massive amounts of model-generated text one by one to identify errors. This approach has several limitations: First, manual annotation is extremely costly, especially in specialized fields, where the required expertise of the annotators further increases both human and time costs; second, this method struggles to cover diverse and dynamically changing real-world application scenarios, resulting in a narrow and rigid detection scope. Summary of the Invention
[0004] In view of the above, one or more embodiments of this specification provide the following technical solutions:
[0005] According to a first aspect of one or more embodiments of this specification, a training sample generation method is proposed, comprising:
[0006] Obtain the original dialogue material of the target application scenario, wherein the original dialogue material includes the model input question and the model output response;
[0007] Named entity recognition is performed on the model's output response to extract N target entities related to the target application scenario, where N > 0;
[0008] Based on the N target entities and the relationships between different target entities, the original dialogue material is decomposed into multiple question-answer pairs, such that each question-answer pair corresponds to a target entity or a group of target entities with related relationships.
[0009] The multiple question-answer pairs are input into the language model respectively, so that the language model performs logical consistency detection on each question-answer pair and outputs the corresponding detection results. The detection results include: a combination of inconsistent conclusions and hallucination types, or a consistent conclusion.
[0010] Based on each question-answer pair and the corresponding detection results, the first training sample is generated for training the hallucination detection model.
[0011] According to a second aspect of one or more embodiments of this specification, a method for training a hallucination detection model is proposed, comprising:
[0012] Obtain a first training sample, which includes question-answer pairs as input to the model and labeled phantom identifiers as supervision labels, the labeled phantom identifiers including a first identifier for indicating no phantom questions and a second identifier for indicating any type of phantom.
[0013] The question-and-answer pair is input into the hallucination detection model to be trained, and the predicted hallucination identifiers output by the hallucination detection model are obtained.
[0014] The parameters of the hallucination detection model are adjusted with the optimization objective of minimizing the difference between the predicted hallucination identifier and the labeled hallucination identifier.
[0015] According to a third aspect of the embodiments of this specification, a method for training a hallucination processing model is provided, comprising:
[0016] Obtain a second training sample, which includes an input sample and a supervision label; the input sample includes at least one of a first combination, a second combination, and a third combination, wherein the first combination includes model input information and model output response, the second combination includes the model input information, the model output response, and reflection content, and the third combination includes the model input information, the model output response, the reflection content, and reference information; the supervision label includes a corrected response;
[0017] The input sample is input into the hallucination processing model to be trained, and the prediction result output by the hallucination processing model is obtained.
[0018] The parameters of the hallucination processing model are adjusted with the optimization objective of minimizing the difference between the predicted result and the corrected response.
[0019] According to a fourth aspect of the embodiments of this specification, a method for training a hallucination processing model is provided, comprising:
[0020] Obtain a second training sample, which includes an input sample, a secondary supervision label, and a primary supervision label; the input sample includes at least one of the following: a model input question from the original dialogue material, a first combination, and a second combination; the first combination includes the model input question and reflection content, and the second combination includes the model input question, the reflection content, and reference information; the secondary supervision label includes the model output response; the primary supervision label includes the corrected response;
[0021] The input samples are respectively input into the hallucination processing model to be trained and the benchmark model to obtain the prediction results output by the hallucination processing model and the reference results output by the benchmark model.
[0022] Provided that the difference in output distribution between the predicted result and the reference result does not exceed a preset threshold, the parameters of the hallucination processing model are adjusted with the optimization objective of making the predicted result closer to the preferred supervision label and farther away from the secondary supervision label.
[0023] According to a fifth aspect of the embodiments of this specification, an electronic device is provided, comprising:
[0024] processor;
[0025] Memory used to store processor-executable instructions;
[0026] Wherein, when the processor executes the executable instructions, it is used to implement the method described in the first aspect.
[0027] According to a sixth aspect of the embodiments of this specification, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps of the method described in the first aspect.
[0028] According to a seventh aspect of the embodiments of this specification, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps of the method described in the first aspect.
[0029] As described in the above embodiments, this specification acquires the original dialogue material of the target application scenario, including the model input question and its corresponding model output response. Then, by performing named entity recognition on the model output response, N target entities related to the target application scenario are automatically extracted. Based on this, and according to these target entities and the semantic relationships between them, the original dialogue material is decomposed into multiple question-answer pairs. Each question-answer pair focuses on verifying an independent target entity or a group of closely related target entities, thereby achieving fine-grained deconstruction of the dialogue content.
[0030] Subsequently, these question-answer pairs are input into a language model for logical consistency checks to obtain the detection results. Finally, based on each question-answer pair and its detection results, the first training sample for training the hallucination detection model is automatically generated. This achieves full automation from raw dialogue material to training samples, significantly reducing manual annotation costs and improving data generation efficiency. By decomposing complex raw dialogue material into question-answer pairs targeting specific entities or combinations of entities, fine-grained hallucination detection is achieved, improving detection accuracy and providing fine-grained learning material for the hallucination detection model.
[0031] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this specification. Attached Figure Description
[0032] Figure 1 This is a flowchart of a training sample generation method provided in an exemplary embodiment.
[0033] Figure 2 This is a flowchart of a training method for a hallucination detection model provided in an exemplary embodiment.
[0034] Figure 3 This is a flowchart of a training method for a vision processing model provided in an exemplary embodiment.
[0035] Figure 4 This is a flowchart of a training method for a vision processing model provided in an exemplary embodiment.
[0036] Figure 5 This is a schematic diagram of the structure of a device provided in an exemplary embodiment. Detailed Implementation
[0037] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.
[0038] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this manual are all information and data authorized by the user or fully authorized by all parties. The collection, use and processing of related data shall comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals shall be provided for users to choose to authorize or refuse.
[0039] Because large language models are prone to generating outputs containing logical inconsistencies or factual errors in complex scenarios such as finance and customer service, and because manually annotating hallucination samples is costly, this specification provides a training sample generation method. This method can automatically analyze original dialogue materials based on a language model and identify potential hallucination problems, achieving efficient and scalable training data construction. This method can be executed by electronic devices, including but not limited to servers (single-machine servers, cluster servers, cloud servers), desktop computers, portable computers, smart terminals (smartphones, tablets), and edge computing devices.
[0040] This electronic device possesses the hardware foundation for data storage, model deployment, and data processing, which may include: ① processors, such as central processing units, graphics processing units, and dedicated AI processors; ② memory, such as random access memory, read-only memory, and solid-state drives; ③ communication interfaces, such as wired communication modules and wireless communication modules. It also includes the software environment for loading and running the language model, such as an operating system, deep learning frameworks, and model inference engines. The electronic device can independently execute the aforementioned training sample generation methods, or it can collaborate through a distributed architecture, where the server is responsible for model inference and sample generation, while the terminal is responsible for sample collection and initial screening.
[0041] Please see Figure 1 Training sample generation methods include:
[0042] In S100, the original dialogue material of the target application scenario is obtained. The original dialogue material includes the model input question and the model output response.
[0043] The model input problem includes either the unanswered question or a combination of the unanswered question and relevant contextual information.
[0044] For example, data can be collected from the output data of large language models in real-world application systems, such as customer interaction content, question-and-answer records, and generated recommended answers. By aggregating these model outputs from target application scenarios, a rich and diverse contextual dataset can be formed, providing an input foundation for subsequent steps.
[0045] This step utilizes real-world interaction data from existing models as input, enabling it to cover diverse application contexts and problem types, thereby improving the representativeness and robustness of subsequent training samples; at the same time, it avoids relying entirely on manually compiled data, reducing the time and cost of sample construction.
[0046] In S102, named entity recognition is performed on the model's output response to extract N target entities related to the target application scenario, where N > 0.
[0047] This step uses Named Entity Recognition (NER) technology to perform named entity recognition on the model's output response, in order to extract target entities relevant to the target application scenario. For example, in a financial scenario, target entities include customer name, transaction amount, date, and product name. In an educational scenario, target entities include knowledge points, exam names, and subjects.
[0048] In this step, named entity recognition transforms the originally unstructured language model output into structured semantic features, providing basic semantic units for subsequent logical consistency detection.
[0049] In S104, based on N target entities and the relationships between different target entities, the original dialogue material is decomposed into multiple question-answer pairs, so that each question-answer pair corresponds to a target entity or a group of target entities with related relationships.
[0050] For example, knowledge graphs or semantic similarity calculation methods can be used to determine the relationships between different entities. For instance, when a contextual coupling relationship is detected between "financial product A" and "yield", they can be grouped into a set of related target entities.
[0051] In this step, by breaking down complex long text dialogues (original dialogue material) at the entity level, multiple semantically independent detection units (each question-answer pair) can be decomposed, thereby achieving fine-grained illusion recognition. This not only improves detection accuracy but also facilitates subsequent sample annotation and model optimization.
[0052] For example, suppose the original dialogue material includes: The model input question is "Please introduce Bank X's product Y." The model output response is "Y is a star current account wealth management product of Bank X. Its latest seven-day annualized yield is 3.5%, the risk level is R1, and the minimum investment is 1 cent." The extracted target entities include: {①Product type: current account wealth management; ②Indicator name: seven-day annualized yield; ③Value: 3.5%; ④Indicator name: risk level; ⑤Level: R1; ⑥Indicator name: minimum investment amount; ⑦Value: 1 cent}.
[0053] The following question-and-answer pairs can be broken down as follows: ① Question: What type of product is Bank X's Product Y? Answer: It is a current account wealth management product. ② Question: What is the latest seven-day annualized yield of Bank X's Product Y? Answer: It is 3.5%. Relationship: The indicator name (seven-day annualized yield) and the value (3.5%) together constitute a complete statement of fact about the yield. ③ Question: What are the risk level and minimum investment amount of Bank X's Product Y? Answer: The risk level is R1, and the minimum investment is 1 cent. Relationship: Risk level and minimum investment amount are both core attributes describing product risk and eligibility criteria. They are logically closely related and can therefore be combined into a question-and-answer pair for consistency testing.
[0054] The example above breaks down a complex model response containing multiple facts into three simpler, more focused verification tasks. Each question-and-answer pair verifies only one core fact (such as product type) or a set of closely related facts (such as profitability, risk, and threshold). These units are the smallest logical units for fact-checking.
[0055] In one possible implementation, before decomposing the original dialogue material into multiple question-answer pairs based on N target entities and the relationships between them, the electronic device can input the N target entities and the original dialogue material into a language model. This allows the language model to perform consistency checks on the contextual logic and semantics of the original dialogue material around each target entity, outputting a summary detection result. This summary detection result describes whether there are logical contradictions in the contextual statements related to each target entity in the original dialogue material. This result is not a fine-grained label, but a macroscopic, general judgment. For example, there might be a logical conflict between the statement about the entity "rate of return" and the statement about the entity "investment period"; or, for example, all entity descriptions are consistent in contextual logic. If the summary detection result indicates a logical contradiction, step S104 is executed to decompose the original dialogue material into multiple question-answer pairs. If the summary detection result indicates no logical contradiction, the subsequent decomposition steps can be omitted.
[0056] By introducing the aforementioned pre-screening process, this step can quickly filter out a large number of logically consistent model responses at the macro level, avoiding the more costly fine-grained decomposition and detection of each response, thus significantly saving computational resources and processing time. This forms a two-layer detection architecture of "rapid macro-level screening + precise micro-level localization." The first layer prioritizes speed, handling most simple cases; the second layer (steps S104-S106) prioritizes accuracy, resolving complex and difficult cases. This architecture maximizes overall efficiency while ensuring detection capabilities.
[0057] Understandably, this pre-screening process is an optional optimization strategy, not a necessary procedure. In practice, the decision to introduce this pre-screening process can be made based on the priority requirements of accuracy and efficiency in the actual application.
[0058] In S106, multiple question-answer pairs are input into the language model so that the language model can perform logical consistency checks on each question-answer pair and output the corresponding detection results. The detection results include: combinations of inconsistent conclusions and hallucination types, or consistent conclusions.
[0059] This step inputs each of the generated question-answer pairs into a language model. Specific prompts can be designed to guide the language model in acting as a checker. The language model's task is to analyze whether the question-answer pairs logically contradict the context or general knowledge. The detection results are output in a structured format, typically including consistent conclusions or inconsistent conclusions with specific hallucination types, such as factual errors, data errors, and logical conflicts. This achieves automated, high-volume identification of hallucinations in the model's output. Leveraging the powerful semantic understanding capabilities of the language model, it can uncover deep logical inconsistencies that traditional rule-based methods struggle to capture, significantly improving the detection coverage and intelligence level.
[0060] In one possible implementation, factual consistency detection can also be achieved by introducing external knowledge from domain authorities. A pre-defined knowledge base can be established, specifically for the domain, containing authoritative information such as core concepts, principles, standards, factual data, and typical cases. The knowledge base can be constructed through methods such as crawling authoritative domain literature, compiling expert experience, and digitizing standards and specifications. It can also be updated regularly to ensure the timeliness and accuracy of the knowledge, preventing outdated knowledge from affecting the detection results.
[0061] The electronic device can acquire a first vector derived from each target entity and a second vector derived from various knowledge items in a pre-set knowledge base. This transforms textual target entities and knowledge into computer-computable semantic vectors, achieving semantic similarity matching. The first and second vectors are generated using the same semantic encoding model. Then, for each target entity, the electronic device calculates the similarity between the first vector and each second vector. Similarity calculation can employ semantic matching algorithms such as cosine similarity and Euclidean distance to measure the semantic association between the first vector and each second vector. This allows the acquisition of target knowledge indicated by second vectors with similarity exceeding a preset threshold, or target knowledge indicated by the N most similar second vectors (N > 0).
[0062] Furthermore, the electronic device can input each question-and-answer pair and the target knowledge belonging to the same target entity as the question-and-answer pair into the language model, so that the language model can perform logical consistency and factual consistency checks on the question-and-answer pair and output the detection results. This embodiment greatly improves the accuracy and reliability of hallucination detection by introducing an external knowledge verification mechanism, fundamentally solving the knowledge conflict problem. The detection process no longer relies on the potentially outdated or erroneous parameterized knowledge within the language model, but is anchored to a controllable, updatable, and highly reliable external knowledge source, thereby accurately identifying factual hallucinations caused by model knowledge defects.
[0063] In S108, based on each question-answer pair and the corresponding detection results, the first training sample for training the hallucination detection model is generated.
[0064] This step uses the question-answer pairs generated in S104 as input to the hallucination detection model and the detection results output by the language model in S106 as the supervision labels for the hallucination detection model, combining them into a large number of (input, output) sample pairs. For example, a training sample could have the input "Question-answer pair: The annualized rate of return of product A is 5%", and the corresponding output label "Hallucination type: Data error". This ultimately produces the first training sample for training the hallucination detection model. The first training sample generated by this method has a large data volume, low annotation cost, and high quality, making it possible to train a professional model that can quickly and accurately identify fine-grained hallucinations, thus solving the pain points of high cost, low efficiency, and narrow coverage caused by relying entirely on manual annotation.
[0065] This embodiment automates the entire process from raw dialogue material to the first training sample, eliminating reliance on tedious manual annotation, significantly reducing costs and improving efficiency. By breaking down complex raw dialogue material into question-answer pairs targeting specific entities or combinations of entities, fine-grained illusion detection is achieved, improving detection accuracy. Moreover, it does not require access to the internal structure of the model that produces the raw dialogue material; by analyzing the model's input-output behavior, it determines whether the generated content contains illusions.
[0066] In some embodiments, the electronic device can also acquire reference information from the original dialogue material to assist the language model in generating more accurate reflection and corrections. The reference information includes at least one of the following:
[0067] ① Target knowledge corresponding to each target entity is obtained from a pre-built knowledge base; the acquisition of target knowledge is described above and will not be repeated here. Target knowledge originates from a pre-built, validated knowledge base, providing a reference standard for each target entity (such as concepts, indicators, and values) extracted during the named entity recognition stage. During reflection, the language model can accurately compare the content of the output response with the corresponding target knowledge, thereby identifying factual errors, data biases, or outdated information. For example, when the model outputs an incorrect value, the target knowledge can directly serve as evidence to indicate the error.
[0068] ② Historical context information of the model input question; by introducing historical context, the language model can determine whether the current output response contradicts the user's previously expressed intentions, needs, or commitments made by the model itself. This effectively solves the deep illusion problem, such as logical breaks and information conflicts, which only appear in multi-turn dialogues.
[0069] ③ Reference question-and-answer pairs that meet preset quality conditions, and whose semantic similarity to the model's input question is higher than a preset threshold. These reference question-and-answer pairs can come from human review, high-confidence outputs, or validated high-quality model responses, and can be considered a sample set of high-quality answers. During the reflection process, the language model can compare its current output with the reference question-and-answer pairs to identify differences in semantic coverage, factual accuracy, or expression logic. This contrastive learning-based mechanism helps the model extract correction patterns from high-quality samples, making the generated corrected responses closer to the ideal answer. For example, when the model identifies factual omissions or vague expressions in its own answer, it can refer to the handling methods of similar questions in high-quality question-and-answer examples to output corrected results with more reasonable structure and more accurate information.
[0070] Electronic devices can input raw dialogue materials and reference information into a language model. The language model then reflects on its output response based on the reference information, outputting reflection content. This reflection content includes errors in the model's output response and corrective measures for those errors. Errors include, but are not limited to: arithmetic errors, logical jumps, misreading premises, missed conditions, factual errors, unit errors, knowledge confusion (e.g., confusing concepts in similar domains), causal inversion (e.g., using the result as a premise), ignoring boundary conditions (e.g., not considering constraints in the problem), and redundant reasoning (e.g., the existence of invalid reasoning steps irrelevant to the conclusion). Corrective measures are actionable, not generalized suggestions. For example, for arithmetic errors, the corrective measure is to re-check the numerical substitution and calculation process of parameters in the formula, ensuring consistency in decimal places and units; for logical jumps, the corrective measure is to supplement the missing link in premise A → intermediate conclusion B → final conclusion C, clarifying the logical connection between B and A, C. This step automates error diagnosis through the language model, replacing manual analysis, significantly reducing labor and time costs, and improving processing efficiency. The precise output of error points and correction ideas provides a clear direction for subsequent corrections, avoids blindly regenerating the model, and ensures the targeted and effective nature of the correction process.
[0071] Next, the electronic device can input the model's input question, model's output response, and reflection content into the language model, which will then output a corrected response. The model's input question provides the core task context, the model's output response provides a reference for error samples, and the reflection content provides the basis for correction. Together, these three elements constitute the corrected input context for the language model, ensuring that the model clearly understands what problem it needs to solve, where the original error lies, and how to correct it. The complete input of contextual information and the flexible implementation methods ensure the quality of the corrected response, significantly improving the success rate of converting incorrect reasoning into correct reasoning.
[0072] Finally, the electronic device can generate a second training sample for training the hallucination processing model based on at least one of the original dialogue material, reflected content, revised responses, and reference information. By introducing such samples into subsequent training, the hallucination processing model can learn the full-process features from error recognition to correction, thus possessing a stronger self-correction capability.
[0073] This embodiment implements an intelligent hallucination recognition and correction process based on reference information. Knowledge base information improves factual consistency, historical context enhances logical coherence, and high-quality question answers provide a reference for expression and structural optimization, thereby significantly improving the accuracy, rationality, and interpretability of the generated corrected responses.
[0074] In some embodiments, the aforementioned first training samples capture various factual errors, logical contradictions, and compliance issues that the model may encounter in specific domains (such as finance, healthcare, and law). Based on this, please refer to [link to relevant documentation]. Figure 2 This specification also provides a method for training a hallucination detection model. This method can be executed by an electronic device, including but not limited to servers (single-machine servers, cluster servers, cloud servers), desktop computers, portable computers, smart terminals (smartphones, tablets), and edge computing devices. Through this method, the hallucination detection model can be trained with high quality based on the aforementioned first training samples, thereby enabling the model to recognize hallucination content in the model's output. The method includes:
[0075] In S200, a first training sample is obtained, which includes question-answer pairs as input to the model, and labeled phantom identifiers as supervision labels, which include a first identifier for indicating no phantom questions and a second identifier for indicating any type of phantom.
[0076] In this system, the question-answer pairs originate from fine-grained units formed after named entity recognition and decomposition of the original dialogue materials in the aforementioned implementation method. The hallucination markers, serving as supervisory labels, are not obtained through costly manual annotation but are automatically generated after consistency checks by the language model in the aforementioned process. The marker system can be designed as a classification label, including a first marker for "no hallucinations," such as "0," and second markers for specific hallucination types, such as "factual error," "logical conflict," and "data inconsistency," for example, "1" for factual error, "2" for logical conflict, "3" for data inconsistency, and so on. This provides large-scale, high-quality, and accurately labeled training data for model training, fundamentally solving the core pain points of traditional hallucination detection methods, which rely on manual annotation, resulting in high costs, limited coverage, and slow iteration.
[0077] In S202, the question-answer pair is input into the hallucination detection model to be trained, and the predicted hallucination identifiers output by the hallucination detection model are obtained.
[0078] For example, the hallucination detection model to be trained can be a classification model, such as a text classifier based on the Transformer architecture. During training, the model receives these question-answer pairs as input, performs semantic encoding and understanding through its internal multi-layer neural network, and finally produces a predicted hallucination label for that question-answer pair at the output layer. This predicted hallucination label represents the model's current judgment on whether the input content is a hallucination and what kind of hallucination it is. This achieves an automated detection reasoning process. Through repeated training, the model will gradually internalize the mapping relationship from the text features of the question-answer pairs to the hallucination category, thereby possessing the ability to quickly and batch-screen hallucinations from newly emerging and unseen model outputs.
[0079] In S204, the parameters of the phantom detection model are adjusted with the optimization objective of minimizing the difference between the predicted phantom identifier and the labeled phantom identifier.
[0080] In practice, the optimization objective is achieved by defining a loss function (such as cross-entropy loss or KL divergence), which accurately quantifies the difference between the model's predictions and the true labels. Subsequently, the gradient of the loss function with respect to all model parameters is calculated using backpropagation, and optimization algorithms such as gradient descent are employed to iteratively update the parameters. By continuously minimizing the prediction error, the model parameters are constantly fine-tuned, making its discrimination rules increasingly consistent with the true criteria for hallucination detection. This not only directly improves the model's classification accuracy for known hallucination types but also enhances its generalization ability, enabling it to better handle novel hallucination patterns in complex and ever-changing real-world application scenarios. Ultimately, this ensures that the trained model possesses efficient and reliable hallucination detection capabilities.
[0081] The hallucination detection model training method in this embodiment can utilize automatically generated structured first training samples to construct a hallucination detection model with high recognition accuracy, realize the automatic detection and classification of hallucination problems, reduce the burden of manual annotation, and significantly improve the authenticity and security of the model output content.
[0082] In some embodiments, please refer to Figure 3 This specification also provides a method for training a hallucination processing model, which can be executed by an electronic device, including but not limited to servers (single-machine servers, cluster servers, cloud servers), desktop computers, portable computers, smart terminals (smartphones, tablets), and edge computing devices. This method allows for high-quality training of the hallucination processing model based on the aforementioned second training samples, thereby enabling the model to correct responses to problems involving hallucinations. The method includes:
[0083] In S300, a second training sample is obtained, which includes input samples and supervision labels; the input sample includes at least one of a first combination, a second combination, and a third combination, the first combination includes model input information and model output response, the second combination includes model input information, model output response, and reflection content, and the third combination includes model input information, model output response, reflection content, and reference information; the supervision label includes the corrected response.
[0084] The second training sample comes from the high-quality data automatically generated through the aforementioned language model reflection and correction process.
[0085] The first set of components (model input information and model output response) aims to train the model to have end-to-end direct correction capabilities, enabling it to autonomously identify and correct errors. The second set, building upon the first, incorporates reflective content, guiding the model to learn a "diagnosis before treatment" reasoning logic, making its correction behavior more interpretable and enhancing its ability to handle complex errors. The third set further integrates reference information (such as target knowledge and historical context), providing the model with the richest decision-making context, training it to generate high-precision correction results closely aligned with authoritative evidence and dialogue history. A unified supervision label (correction response) provides a common learning objective for all input patterns: generating high-quality, illusion-free final output.
[0086] For example, the representations of the first, second, and third combinations can be chain-like expressions based on thought chains, thereby teaching the model how to think to obtain the correct answer, enhancing the model's inherent ability to understand, decompose, and solve complex logical problems, and ultimately training a hallucination processing model that is not only more accurate but also more robust and interpretable.
[0087] The first set of chained expressions simulates a basic, intrinsic reasoning process; the second set introduces explicit error analysis and correction strategies; and the third set further incorporates external reference information for corroboration. On one hand, this allows the model to learn problem-solving methodologies, not just the answers themselves, greatly enhancing its generalization ability and enabling it to apply knowledge to unprecedentedly complex hallucination situations. On the other hand, if the model also simulates chained thinking during reasoning (e.g., outputting its reasoning steps before outputting the final answer), then the model's decision-making process becomes transparent and traceable. Users can clearly see which entity caused the problem and which knowledge was used for correction, greatly increasing user trust in the model and facilitating debugging and optimization by developers when new problems arise.
[0088] In S302, the input sample is fed into the hallucination processing model to be trained, and the prediction result output by the hallucination processing model is obtained.
[0089] This step involves the model's forward reasoning and learning. The hallucination processing model to be trained receives input samples in different combinations provided by the S300. The model understands the complex semantics of the input; for example, in the second and third combinations, it comprehensively processes the original question, incorrect answers, reflective logic, and even external knowledge. Based on its deep encoding and understanding of this information, the model ultimately outputs a text sequence, i.e., the prediction result, which is a prediction of the supervised label (standard corrected answer). By repeatedly executing this process, the model gradually learns the complex mapping relationship from various input scenarios to the ideal corrected result.
[0090] In S304, the parameters of the hallucination processing model are adjusted with the optimization objective of minimizing the difference between the predicted result and the corrected response.
[0091] This step employs a supervised learning paradigm. The optimization objective is precisely quantified using a loss function. Subsequently, the backpropagation algorithm is used to calculate the gradient of the difference between the predicted result and the corrected response with respect to all model parameters. The optimizer then iteratively updates the model parameters based on the gradient direction. The essence of this optimization objective is to force the model to continuously adjust its internal representation and computational rules so that its output distribution is as close as possible to the ideal distribution defined by the high-quality corrected response.
[0092] The hallucination processing model training method provided in this embodiment achieves efficient training of a dedicated hallucination processing model by utilizing automatically constructed second training samples. First, the complex reflection and correction capabilities of a large language model are compressed into a more efficient and easily deployable dedicated model. Second, through the design of multiple input sample combinations, the trained model possesses strong pattern adaptability and robustness, capable of flexibly handling input scenarios of varying complexity, and generating reliable correction results regardless of whether additional reflection or knowledge assistance is provided. Finally, the resulting hallucination processing model can be integrated into a production pipeline to optimize the output of preceding models online, significantly improving the overall system's output quality and reliability.
[0093] In some embodiments, please refer to Figure 4 This specification also provides a method for training a hallucination processing model, which can be executed by an electronic device, including but not limited to servers (single-machine servers, cluster servers, cloud servers), desktop computers, portable computers, smart terminals (smartphones, tablets), and edge computing devices. This method allows for high-quality training of the hallucination processing model based on the aforementioned second training samples, thereby enabling the model to correct responses to problems involving hallucinations. The method includes:
[0094] In S400, a second training sample is obtained, which includes an input sample, a secondary supervision label, and a primary supervision label. The input sample includes at least one of the following: a model input question in the original dialogue material, a first combination, and a second combination. The first combination includes the model input question and reflection content, and the second combination includes the model input question, reflection content, and reference information. The secondary supervision label includes the model output response. The primary supervision label includes the corrected response.
[0095] For the input samples, samples containing only the model's input question are designed to train the model's ability to independently generate high-quality answers; a first combination containing the model's input question and reflective content guides the model to learn the reasoning process for diagnosis and correction; a second combination including the model's input question, reflective content, and reference information trains the model to make accurate corrections based on external evidence. This enables the model to adapt to correction scenarios of varying complexity.
[0096] For example, the representations corresponding to the first and second combinations can be chain-like expressions based on thought chains, thereby teaching the model how to think to obtain the correct answer, enhancing the model's inherent ability to understand, decompose and solve complex logical problems, and ultimately training a hallucination processing model that is not only more accurate, but also more robust and interpretable.
[0097] The secondary and primary supervision labels form a clear preference pair. The former represents low-quality answers with hallucination problems, while the latter represents high-quality answers that have been corrected, providing a clear learning objective for subsequent preference optimization.
[0098] In S402, the input samples are fed into the hallucination processing model to be trained and the baseline model respectively, to obtain the prediction results output by the hallucination processing model and the reference results output by the baseline model.
[0099] In this step, the hallucination processing model to be trained is responsible for learning the optimization strategy; while the baseline model can be an initial supervised fine-tuning model, such as the supervised hallucination processing model mentioned above or other language models, whose output distribution represents the behavioral pattern before training. By computing the output results of the two models on the same input sample in parallel, the degree of behavioral deviation of the hallucination processing model to be trained relative to the baseline model can be accurately quantified. This comparison mechanism provides the necessary computational foundation for subsequent constrained optimization, ensuring the stability of the model update direction.
[0100] In S404, under the premise that the difference in output distribution between the predicted result and the reference result does not exceed a preset threshold, the optimization objective is to make the predicted result closer to the preferred supervision label and farther away from the secondary supervision label, and the parameters of the illusion processing model are adjusted.
[0101] In this step, the optimization objective is achieved based on the adjustment of the preference distribution. Specifically, under the condition that the difference in output distribution between the predicted result of the hallucination processing model to be trained and the reference result of the benchmark model does not exceed a preset threshold, the update direction of the model parameters is guided so that the model output is semantically closer to the corrected response represented by the preferred supervision label, while maintaining differentiation from the original response represented by the secondary supervision label.
[0102] The optimization objective considers three dimensions simultaneously: First, distribution difference constraints ensure that model updates do not deviate excessively from the initial behavior, maintaining the model's basic capabilities and output quality, and avoiding pattern collapse. Second, optimization by approaching the optimal supervision label enables the model to learn the ability to generate outputs of equivalent high quality to the corrected responses. Finally, optimization by moving away from the second-best supervision label allows the model to proactively avoid various hallucination problems in the original responses. This triple optimization objective can be achieved by combining KL divergence constraints and a preference loss function. Through this approach, the model can learn to identify and correct preferences for hallucination content while maintaining overall output stability, thereby improving the model's self-correction ability regarding hallucination problems.
[0103] The training method provided in this embodiment significantly enhances the capabilities of the phantom processing model by introducing preference learning and distribution constraint mechanisms. First, it enables the model not only to learn to generate correct answers but also to distinguish between high and low quality responses, cultivating semantic-based quality judgment abilities. Second, distribution constraints effectively prevent performance degradation or abnormal output during optimization, ensuring the stability and convergence of the training process. Finally, the model trained using this method possesses stronger intent alignment and generalization capabilities, enabling it to understand users' deeper needs and generate accurate and expected answers, significantly improving its practical value in real-world applications.
[0104] The various technical features in the above embodiments can be combined arbitrarily, as long as there is no conflict or contradiction between the combinations of features. However, due to space limitations, they are not described one by one. Therefore, the arbitrary combination of various technical features in the above embodiments is also within the scope of this specification.
[0105] Figure 5 This is a schematic structural diagram of a device provided in an exemplary embodiment. For example... Figure 5As shown, device 500 mainly consists of a communication interface 502, a user interface 504, a processor 506, and a data storage 508. These components are interconnected and communicate with each other via a system bus, network, or other connection mechanism 510. The communication interface 502 enables device 500 to communicate with other devices, access networks, and transmission networks via analog or digital modulation. For example, the communication interface 502 may include a chipset and antenna for wireless communication with a radio access network or access point. Furthermore, the communication interface 502 can be a wired interface such as Ethernet, Token Ring, or a USB port, or a wireless interface such as Wi-Fi, Bluetooth, Global Positioning System (GPS), or a wide-area wireless interface (e.g., WiMAX or LTE). Of course, the communication interface 502 can also support other forms of physical layer interfaces and standard or proprietary communication protocols. The communication interface 502 may also include multiple physical communication interfaces, such as Wi-Fi interfaces, Bluetooth interfaces, and wide-area wireless interfaces.
[0106] User interface 504 includes receiving user input and providing output to the user. Therefore, user interface 504 may include input components such as a keypad, keyboard, touch-sensitive or presence-sensitive panel, computer mouse, trackball, joystick, microphone, still camera, and video camera, and output components such as a display screen (which may be combined with a touch-sensitive panel), CRT, LCD, LED, display using DLP technology, printer, and other similar devices known or developed in the future. User interface 504 may also generate auditory output via speakers, speaker jacks, audio output ports, audio output devices, headphones, and other similar devices known or developed in the future. In some embodiments, user interface 504 may include software, circuitry, or other forms of logic capable of transmitting and receiving data from external user input / output devices. Additionally or alternatively, device 500 may support remote access from other devices via communication interface 502 or another physical interface (not shown). User interface 504 may be configured to receive user input, the position and movement of which may be indicated by indicators or cursors described herein. User interface 504 may also be configured as a display device for rendering or displaying text fragments.
[0107] Processor 506 may contain one or more general-purpose processors and / or special-purpose processors.
[0108] Data storage 508 may include one or more volatile and / or non-volatile storage components and may be integrated wholly or partially with processor 506. Data storage 508 may include removable and non-removable components.
[0109] Processor 506 is capable of executing program instructions 518 (e.g., compiled or uncompiled program logic and / or machine code) stored in data storage 508 to perform the various functions described herein. Data storage 508 may contain a non-transitory computer-readable medium on which program instructions are stored, which, when executed by device 500, enable device 500 to perform any methods, processes, or functions disclosed in this specification and / or the accompanying drawings. Execution of program instructions 518 by processor 506 may result in processor 506 using data 512.
[0110] For example, program instructions 518 may include an operating system 522 (e.g., an operating system kernel, device drivers, and / or other modules) installed on device 500 and one or more applications 520 (e.g., a browser, social application, or game application). Similarly, data 512 may include operating system data 516 and application data 514. Operating system data 516 is primarily accessible to the operating system 522, while application data 514 is primarily accessible to one or more applications 520. Application data 514 may reside in a file system visible or hidden from the user of device 500.
[0111] Application 520 can communicate with operating system 522 through one or more application programming interfaces (APIs). These APIs help application 520 read and / or write application data 514, transmit or receive information via communication interface 502, receive or display information on user interface 504, etc.
[0112] In some terminology, application 520 may be simply referred to as "app". Furthermore, application 520 can be downloaded to device 500 through one or more online app stores or app markets. However, applications can also be installed on device 500 in other ways, such as through a web browser or a physical interface on device 500 (e.g., a USB port).
[0113] In some embodiments, the training sample generation device can be applied to, for example... Figure 5 The device shown is used to implement the technical solution of this specification. The training sample generation apparatus may include:
[0114] The original dialogue material acquisition module is used to acquire the original dialogue material of the target application scenario. The original dialogue material includes the model input question and the model output response.
[0115] The named entity recognition module is used to perform named entity recognition on the model's output response and extract N target entities related to the target application scenario, where N > 0;
[0116] The original dialogue material decomposition module is used to decompose the original dialogue material into multiple question-answer pairs based on N target entities and the relationships between different target entities, so that each question-answer pair corresponds to a target entity or a group of target entities with relationships.
[0117] The question-answer pair detection module is used to input multiple question-answer pairs into the language model so that the language model can perform logical consistency detection on each question-answer pair and output the corresponding detection results. The detection results include: combinations of inconsistent conclusions and hallucination types, or consistent conclusions.
[0118] The first training sample generation module is used to generate the first training samples for training the hallucination detection model based on each question-answer pair and the corresponding detection results.
[0119] In one implementation, a target knowledge acquisition module is also included, which is used to acquire a first vector converted from each target entity, and to acquire a second vector converted from each knowledge item in a preset knowledge base; for each target entity, based on the similarity between the first vector of the target entity and each second vector, target knowledge indicated by the second vector with a similarity higher than a preset threshold, or target knowledge indicated by the N second vectors with the highest similarity, where N>0, is acquired.
[0120] The question-answer pair detection module is specifically used to input each question-answer pair and the target knowledge belonging to the same target entity as the question-answer pair into the language model, so that the language model can perform logical consistency detection and factual consistency detection on the question-answer pair and output the detection results.
[0121] In one implementation, the system also includes an original dialogue material detection module, which is used to input N target entities and the original dialogue material into a language model so that the language model outputs a summary detection result. The summary detection result is used to describe whether there are logical contradictions in the contextual descriptions related to each target entity in the original dialogue material. If the summary detection result indicates that there are logical contradictions, the original dialogue material is decomposed into multiple question-answer pairs based on the N target entities and the relationships between different target entities.
[0122] In one implementation, a second training sample generation module is further included, used to obtain reference information of the original dialogue material; wherein, the reference information includes at least one of the following: target knowledge corresponding to each target entity obtained from a pre-set knowledge base, historical context information of the model input question, and reference question-answer pairs that meet preset quality conditions; the original dialogue material and the reference information are input into a language model, so that the language model reflects on the model output response based on the reference information and outputs reflection content, the reflection content including the error points in the model output response and the correction ideas for the error points; the model input question, the model output response, and the reflection content are input into the language model, so that the language model outputs a corrected response; and a second training sample for training the hallucination processing model is generated based on at least one of the original dialogue material, the reflection content, the corrected response, and the reference information.
[0123] In some embodiments, the training apparatus for the hallucination detection model can be applied to, for example... Figure 5 The device shown implements the technical solution described in this specification. The training apparatus for the hallucination detection model may include:
[0124] The sample acquisition module is used to acquire first training samples, which include question-answer pairs as input to the model and labeled phantom identifiers as supervision labels. The labeled phantom identifiers include a first identifier for indicating no phantom questions and a second identifier for indicating any type of phantom.
[0125] The model training module is used to input question-answer pairs into the hallucination detection model to be trained, and obtain the predicted hallucination identifiers output by the hallucination detection model.
[0126] The model training module is also used to adjust the parameters of the phantom detection model with the optimization objective of minimizing the difference between the predicted phantom identifiers and the labeled phantom identifiers.
[0127] In some embodiments, the training apparatus for the hallucination processing model can be applied to, for example... Figure 5 The device shown implements the technical solution described in this specification. The training apparatus for the hallucination processing model may include:
[0128] The sample acquisition module is used to acquire a second training sample, which includes input samples and supervision labels. The input samples include at least one of a first combination, a second combination, and a third combination. The first combination includes model input information and model output response. The second combination includes model input information, model output response, and reflection content. The third combination includes model input information, model output response, reflection content, and reference information. The supervision labels include corrected responses.
[0129] The model training module is used to input the input samples into the hallucination processing model to be trained and obtain the prediction results output by the hallucination processing model.
[0130] The model training module is also used to adjust the parameters of the vision processing model with the optimization objective of minimizing the difference between the predicted result and the corrected response.
[0131] In some embodiments, the training apparatus for the hallucination processing model can be applied to, for example... Figure 5 The device shown implements the technical solution described in this specification. The training apparatus for the hallucination processing model may include:
[0132] The sample acquisition module is used to acquire a second training sample, which includes an input sample, a secondary supervision label, and a primary supervision label. The input sample includes at least one of the following: a model input question in the original dialogue material, a first combination, and a second combination. The first combination includes the model input question and reflection content, and the second combination includes the model input question, reflection content, and reference information. The secondary supervision label includes the model output response. The primary supervision label includes the corrected response.
[0133] The model training module is used to input the input samples into the hallucination processing model to be trained and the baseline model, respectively, to obtain the prediction results output by the hallucination processing model and the reference results output by the baseline model.
[0134] The model training module is also used to adjust the parameters of the illusion processing model, with the goal of making the prediction results closer to the preferred supervision label and farther away from the secondary supervision label, provided that the difference in output distribution between the prediction results and the reference results does not exceed a preset threshold.
[0135] For ease of description, the above devices are described by dividing them into various modules or units based on their functions. Of course, when implementing one or more of these specifications, the functions of each module or unit can be implemented in the same or different software and / or hardware, or a module that performs the same function can be implemented by a combination of multiple sub-modules or sub-units, etc. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.
[0136] Based on the same concept as the methods described above, this specification also provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor performs the steps of the method as described in any of the above embodiments by executing the executable instructions.
[0137] Based on the same concept as the methods described above, this specification also provides a computer-readable storage medium having computer instructions stored thereon that, when executed by a processor, implement the steps of the methods as described in any of the above embodiments.
[0138] Based on the same concept as the methods described above, this specification also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the methods as described in any of the above embodiments.
[0139] What those skilled in the art will understand is:
[0140] In this specification, the terms "comprising," "including," or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, product, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, product, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, product, or apparatus that includes said elements is not excluded.
[0141] In this specification, “a,” “an,” and “the” do not specifically refer to the singular, but may also include the plural.
[0142] In this specification, ordinal numbers such as "first," "second," etc., do not necessarily indicate order; they are often used to distinguish between objects. For example, "first server" and "second server" usually refer to two servers. To differentiate between these two servers, they are described as "first server" and "second server." Of course, sometimes these two servers may be the same server.
[0143] In this specification, unless explicitly stated otherwise, "receiving and sending data" does not necessarily mean direct receiving and sending; it can also mean indirect receiving and sending. For example, A receiving data sent by B can be understood as A directly receiving the data sent by B, or it can be understood as A indirectly receiving the data sent by B through other entities such as C. Similarly, B sending data to A can be understood as B sending the data directly to A, or it can be understood as B indirectly sending the data to A through other entities such as C. Here, C can be one entity, or it can be two or more entities.
[0144] In this specification, unless explicitly stated otherwise, the relationships between structures can be direct or indirect. For example, when describing "A is connected to B," unless it is explicitly stated that A and B are directly connected, it should be understood that A can be directly connected to B or indirectly connected to B. Similarly, when describing "A is on top of B," unless it is explicitly stated that A is directly above B (AB is adjacent and A is above B), it should be understood that A can be directly above B or indirectly above B (AB is separated by other elements, and A is above B). And so on.
[0145] This specification uses specific terms to describe embodiments thereof. Terms such as "an embodiment," "one embodiment," and / or "some embodiments" refer to a particular feature, structure, or characteristic associated with at least one embodiment of this specification. Therefore, it should be emphasized and noted that references to "an embodiment," "one embodiment," or "an alternative embodiment" in different locations throughout this specification do not necessarily refer to the same embodiment. Furthermore, those skilled in the art can combine and integrate the different embodiments or examples described herein, as well as the features of those different embodiments or examples, without contradiction.
[0146] Although one or more embodiments of this specification provide method steps as described in the embodiments or flowcharts, it is understood that the order of steps listed in the embodiments or flowcharts is only one of many possible execution orders and does not represent the only execution order. Therefore, when the claims involve method steps, any changes or adjustments to the order of such steps, or the parallelism between steps, are also within the scope of protection of the claims.
Claims
1. A method for generating training samples, comprising: Obtain the original dialogue material of the target application scenario, wherein the original dialogue material includes the model input question and the model output response; Named entity recognition is performed on the model's output response to extract N target entities related to the target application scenario, where N > 0; Based on the N target entities and the relationships between different target entities, the original dialogue material is decomposed into multiple semantically independent question-answer pairs, such that each question-answer pair corresponds to a target entity or a group of target entities with related relationships; wherein, the relationships between different target entities are determined using knowledge graphs or through semantic similarity calculation; The multiple question-answer pairs are input into the language model respectively, so that the language model performs logical consistency detection on each question-answer pair and outputs the corresponding detection results. The detection results include: a combination of inconsistent conclusions and hallucination types, or a consistent conclusion. Based on each question-answer pair and the corresponding detection results, the first training sample is generated for training the hallucination detection model.
2. The method according to claim 1, further comprising: Obtain a first vector transformed from each of the target entities, and obtain a second vector transformed from each knowledge item in a preset knowledge base; For each target entity, based on the similarity between the first vector of the target entity and each of the second vectors, target knowledge indicated by the second vector with a similarity higher than a preset threshold, or target knowledge indicated by the N second vectors with the highest similarity, where N>0, is obtained; The step of inputting the multiple question-answer pairs into a language model, so that the language model performs logical consistency detection on each question-answer pair and outputs the corresponding detection results, includes: Each question-and-answer pair and the target knowledge belonging to the same target entity as the question-and-answer pair are input into the language model, so that the language model performs logical consistency detection and factual consistency detection on the question-and-answer pair and outputs the detection results.
3. The method according to claim 1, before decomposing the original dialogue material into multiple question-answer pairs based on the N target entities and the association relationships between different target entities, further includes: The N target entities and the original dialogue material are input into the language model so that the language model outputs a summary detection result; wherein, the summary detection result is used to describe whether there are logical contradictions in the contextual descriptions related to each target entity in the original dialogue material; Based on the N target entities and the relationships between different target entities, the original dialogue material is decomposed into multiple question-answer pairs, including: If the summary detection results indicate a logical contradiction, the original dialogue material is decomposed into multiple question-answer pairs based on the N target entities and the relationships between different target entities.
4. The method according to any one of claims 1 to 3, further comprising: Obtain reference information for the original dialogue material; wherein, the reference information includes at least one of the following: target knowledge corresponding to each target entity obtained from a preset knowledge base, historical context information of the model input question, and reference question-answer pairs that meet preset quality conditions; The original dialogue material and the reference information are input into the language model so that the language model can reflect on the model output response based on the reference information and output reflection content, which includes the error points in the model output response and the correction ideas for the error points; The input question, the output response, and the reflection content are input into the language model, so that the language model can output a revised response. Based on at least one of the original dialogue material, the reflection content, the corrected response, and the reference information, a second training sample is generated for training the hallucination processing model.
5. A training method for a hallucination detection model, comprising: Obtain a first training sample generated by the method according to any one of claims 1 to 3, the first training sample including question-answer pairs as model input, and labeled phantom identifiers as supervision labels, the labeled phantom identifiers including a first identifier for indicating no phantom questions and a second identifier for indicating any type of phantom; The question-and-answer pair is input into the hallucination detection model to be trained, and the predicted hallucination identifiers output by the hallucination detection model are obtained. The parameters of the hallucination detection model are adjusted with the optimization objective of minimizing the difference between the predicted hallucination identifier and the labeled hallucination identifier.
6. A training method for a hallucination processing model, comprising: Obtain a second training sample generated based on the method of claim 4, the second training sample comprising an input sample and a supervision label; The input samples include at least one of a first combination, a second combination, and a third combination. The first combination includes model input information and model output response. The second combination includes the model input information, the model output response, and reflection content. The third combination includes the model input information, the model output response, the reflection content, and reference information. The monitoring label includes corrected responses; The input sample is input into the hallucination processing model to be trained, and the prediction result output by the hallucination processing model is obtained. The parameters of the hallucination processing model are adjusted with the optimization objective of minimizing the difference between the predicted result and the corrected response.
7. A training method for a hallucination processing model, comprising: Obtain a second training sample generated based on the method of claim 4. The second training sample includes an input sample, a secondary supervision label, and a preferred supervision label. The input sample includes at least one of the following: a model input question, a first combination, and a second combination from the original dialogue material. The first combination includes the model input question and reflection content, and the second combination includes the model input question, the reflection content, and reference information. The secondary supervision label includes the model output response. The preferred supervision label includes a corrected response. The input samples are respectively input into the hallucination processing model to be trained and the benchmark model to obtain the prediction results output by the hallucination processing model and the reference results output by the benchmark model. Provided that the difference in output distribution between the predicted result and the reference result does not exceed a preset threshold, the parameters of the hallucination processing model are adjusted with the optimization objective of making the predicted result closer to the preferred supervision label and farther away from the secondary supervision label.
8. An electronic device, characterized in that, include: processor; A memory for storing processor-executable instructions; wherein the processor implements the steps of the method as described in any one of claims 1-7 by executing the executable instructions.
9. A computer-readable storage medium, characterized in that, It stores computer instructions that, when executed by a processor, implement the steps of the method as described in any one of claims 1-7.
10. A computer program product, characterized in that, Includes a computer program / instructions that, when executed by a processor, implement the steps of the method as described in any one of claims 1-7.