Hallucination detection method, apparatus, device, medium and product
By using a comparison method between semantically similar variant problem texts and independent judge models, the problem of poor effectiveness in hallucination detection during self-verification of large language models is solved, and a more efficient hallucination detection effect is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA UNITED NETWORK COMM GRP CO LTD
- Filing Date
- 2026-02-14
- Publication Date
- 2026-06-19
AI Technical Summary
In existing hallucination detection schemes, large language models cannot effectively identify systematic hallucinations during self-verification due to their own knowledge errors or semantic biases, resulting in poor hallucination detection effectiveness.
By replacing repeated inputs of the same question with semantically similar variant question texts, the answer difference space is expanded. Then, a referee model independent of the tested model is used for comparison to determine the hallucination detection results, thus avoiding missed detections caused by the tested model's own knowledge bias during self-verification.
It improves the accuracy and recall of hallucination detection, enhances the effectiveness of hallucination detection, and avoids missed detections caused by the knowledge bias of the tested model during self-verification.
Smart Images

Figure CN122240434A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of large language model technology, and in particular to a method, apparatus, device, medium and product for hallucination detection. Background Technology
[0002] Large Language Models (LLMs) are deep learning models trained on massive amounts of text data, enabling them to generate natural language text or understand the meaning of linguistic text. With the widespread application of LLMs across various fields, the reliability of their generated content has become a critical challenge. Specifically, without external knowledge support, LLMs may output content that contradicts reality—a kind of illusion—leading users to perceive incorrect information as a reliable answer and make decisions accordingly.
[0003] Current hallucination detection schemes based on output content consistency primarily involve repeatedly inputting the same question from a user, allowing a large language model to perform a self-consistent comparison of the multiple output responses to evaluate whether the model's output responses contain hallucinations. However, using the same large language model for both generation and verification means that if the model itself has knowledge errors or semantic biases, the comparison process remains within the same error framework, failing to identify systematic hallucinations and resulting in poor effectiveness in hallucination detection.
[0004] Therefore, there is an urgent need for a solution that can improve the effectiveness of hallucination detection. Summary of the Invention
[0005] The hallucination detection methods, apparatus, devices, media, and products provided in this application are intended to improve the effectiveness of hallucination detection.
[0006] In a first aspect, embodiments of this application provide a hallucination detection method, including:
[0007] The system obtains the original question text from the user and inputs it into the model under test to obtain the output original answer text; the model under test is a large language model.
[0008] Based on the original question text, multiple variant question texts are determined, and each variant question text is input into the model under test to obtain the output variant answer text that corresponds one-to-one with the variant question text; wherein, the variant question text is semantically similar to the original question text;
[0009] Based on the original answer text, each variant answer text, and the referee model, the hallucination detection result is determined; among them, the referee model is a large language model and is independent of the tested model, and the hallucination detection result represents the hallucination situation of the output text of the tested model.
[0010] Optionally, as described above, based on the original question text, multiple variant question texts are determined, including:
[0011] Based on the original question text and the preset first prompt text, multiple variant question texts are determined using the referee model; the preset first prompt text is used to guide the referee model to output multiple variant question texts based on the original question text.
[0012] Optionally, as described above, based on the original question text, multiple variant question texts are determined, including:
[0013] Based on preset reconstruction and combination rules, the original problem text is reconstructed to generate multiple variant problem texts; among them, the preset reconstruction and combination rules include one or more of synonym replacement, syntax inversion, and word order rearrangement.
[0014] Optionally, as described above, the hallucination detection result is determined based on the original answer text, the variant answer texts, and the referee model, including:
[0015] For each variant answer text, the variant answer text and the original answer text are identified as an answer pair, and the evaluation result of the answer pair is determined based on the referee model; whereby the evaluation result characterizes the degree of consistency between the variant answer text and the original answer text.
[0016] The hallucination detection results are determined based on the evaluation results of each answer pair.
[0017] Optionally, as described above, the evaluation result of the answer pair is determined based on the referee model, including:
[0018] Based on the answer pair and the preset second prompt text, the evaluation result of the answer pair is determined using the judge model; wherein, the preset second prompt text is used to guide the judge model to output the evaluation result of the answer pair.
[0019] Optionally, as described above, the hallucination detection results are determined based on the evaluation results of each pair of answers, including:
[0020] For each evaluation result, an evaluation score is determined based on a pre-defined correlation; whereby the evaluation score represents the quantitative value of the evaluation result's contribution to the hallucination.
[0021] Based on the evaluation scores, a hallucination score is determined; the hallucination score represents the probability that the original answer text is a hallucination.
[0022] The hallucination test results are determined based on the hallucination score.
[0023] Optionally, as described above, the hallucination detection result is determined based on the hallucination score, including:
[0024] Determine the judgment threshold;
[0025] If the hallucination score is greater than or equal to the judgment threshold, then the hallucination detection result is determined to represent the original answer text as hallucination text.
[0026] Secondly, embodiments of this application provide a hallucination detection device, comprising:
[0027] The acquisition module is used to acquire the original question text from the user and input the original question text into the model under test to obtain the output original answer text; where the model under test is a large language model;
[0028] The determination module is used to determine multiple variant question texts based on the original question text, and input each variant question text into the model under test to obtain the output variant answer texts that correspond one-to-one with the variant question texts; wherein, the variant question texts are semantically similar to the original question texts;
[0029] The detection module is used to determine the hallucination detection result based on the original answer text, the various variant answer texts, and the referee model. The referee model is a pre-trained large language model that is independent of the model under test. The hallucination detection result represents the hallucination status of the output text of the model under test.
[0030] Thirdly, embodiments of this application provide an electronic device, including: a memory and a processor;
[0031] The memory stores instructions that the computer executes;
[0032] The processor executes computer execution instructions stored in memory, causing the processor to perform the first aspect and / or various possible implementations of the first aspect as described above.
[0033] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the first aspect and / or various possible implementations of the first aspect.
[0034] Fifthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the first aspect and / or various possible implementations of the first aspect.
[0035] The hallucination detection method, apparatus, device, medium, and product provided in this application embodiment obtains the original question text from the user and inputs it into the model under test to obtain the output original answer text. Further, based on the original question text, multiple variant question texts are determined, and each variant question text is input into the model under test to obtain the output variant answer text that corresponds one-to-one with the variant question text. Further, based on the original answer text, each variant answer text, and the referrer model, the hallucination detection result is determined. The model under test is a large language model, the variant question texts are semantically similar to the original question text, the referrer model is a large language model and is independent of the model under test, and the hallucination detection result characterizes the hallucination status of the output text of the model under test. The hallucination detection method provided in this application expands the answer difference space by replacing repeated inputs of the same question with semantically similar variant question texts, making potential factual conflicts easier to identify. By comparing the original answer text with multiple variant answer texts using a referee model independent of the tested model, the hallucination detection result is determined. This avoids missed detections due to knowledge bias during self-verification by the tested model, thereby improving the accuracy of hallucination detection. The method of this application improves the effectiveness of hallucination detection. Attached Figure Description
[0036] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0037] Figure 1 Flowchart of the hallucination detection method provided in this application Figure 1 ;
[0038] Figure 2 Flowchart of the hallucination detection method provided in this application Figure 2 ;
[0039] Figure 3 A schematic diagram of the architecture of the hallucination detection method provided in this application;
[0040] Figure 4 A schematic diagram of the hallucination detection device provided in this application;
[0041] Figure 5 A schematic diagram of the structure of the electronic device provided in this application.
[0042] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0043] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0044] Large language models are deep learning models trained on massive amounts of text data, enabling them to generate natural language text or understand the meaning of linguistic text. With the widespread application of large language models across various fields, the reliability of their generated content has become a key challenge. Specifically, without external knowledge support, large language models may output content that is inconsistent with the facts, i.e., illusions, causing users to perceive incorrect information as credible answers and make decisions accordingly.
[0045] Current hallucination detection schemes based on output content consistency primarily involve repeatedly inputting the same question from a user, allowing a large language model to perform a self-consistent comparison of the multiple output responses to evaluate whether the model's output responses contain hallucinations. However, using the same large language model for both generation and verification means that if the model itself has knowledge errors or semantic biases, the comparison process remains within the same error framework, failing to identify systematic hallucinations and resulting in poor effectiveness in hallucination detection.
[0046] Therefore, there is an urgent need for a solution that can improve the effectiveness of hallucination detection.
[0047] The hallucination detection method provided in this application expands the answer difference space by replacing repeated inputs of the same question with semantically similar variant question texts, making potential factual conflicts easier to identify. By comparing the original answer text with multiple variant answer texts using a referee model independent of the tested model, the hallucination detection result is determined. This avoids missed detections due to knowledge bias during self-verification by the tested model, thereby improving the accuracy of hallucination detection. The method of this application improves the effectiveness of hallucination detection.
[0048] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0049] Figure 1 Flowchart of the hallucination detection method provided in this application Figure 1 The execution subject of this method can be a host, server, or other device, such as... Figure 1As shown, the method includes:
[0050] S101. Obtain the original question text from the user and input the original question text into the model under test to obtain the output original answer text; wherein, the model under test is a large language model.
[0051] S102. Based on the original question text, determine multiple variant question texts and input each variant question text into the model under test to obtain the output variant answer texts that correspond one-to-one with the variant question texts; wherein, the variant question texts are semantically similar to the original question texts.
[0052] S103. Determine the hallucination detection result based on the original answer text, each variant answer text, and the referee model; wherein, the referee model is a large language model and is independent of the tested model, and the hallucination detection result represents the hallucination situation of the output text of the tested model.
[0053] In step S101, the original question text can refer to any string submitted by the user in natural language form, in which the user expects to receive a factual response, including but not limited to interrogative sentences, declarative sentences, or combinations of keywords.
[0054] For example, the original question text from the user can be obtained through a front-end webpage, application (APP), application programming interface (API), or file upload.
[0055] The tested model can refer to any large language model with autoregressive text generation capabilities. Its deployment can be a local inference service or a cloud interface, and the weight parameters are frozen during the inference phase and used only for generation. In this application, the tested model is used to generate original answer text based on the original question text, which serves as the test object for subsequent hallucination detection, thereby determining the hallucination status of the output text of the tested model (i.e., the original answer text).
[0056] The original answer text can refer to the complete string returned by the tested model after performing a single forward inference on the original question text. It should be understood that there is a one-to-one correspondence between the original question text and the original answer text.
[0057] In step S102, the mutated question text can refer to the natural language text obtained by mutating the original question text, which is semantically equivalent to the original question text and has the same expected answer. It is used to prompt the tested model to generate the answer from multiple perspectives without changing the core semantics, so as to expose potential factual conflicts or detailed differences in the output text of the tested model.
[0058] Furthermore, by inputting each variant question text into the model under test, we can obtain the variant answer text that corresponds one-to-one with the variant question text.
[0059] It is understandable that the variant problem text only changes the form of expression without changing the semantic content compared to the original problem text, and all variant problem texts are independently input into the same tested model to ensure the uniformity of the core semantics of the input text and the comparability of the output results.
[0060] In step S103, the referee model can refer to another major language model that has no overlap with the tested model in terms of parameters, weights, training data or model architecture and is dedicated to determining text consistency.
[0061] It should be understood that the average accuracy of the judge model on the general fact-judgment benchmark is no lower than that of the tested model, in order to ensure the reliability of the hallucination detection results.
[0062] Hallucination detection results can refer to numerical or Boolean values obtained from the consistency assessment between the original answer text and each variant answer text, used to quantify or qualitatively describe the deviation of the original answer text from the facts.
[0063] The hallucination detection method provided in this application expands the answer difference space by replacing repeated inputs of the same question with semantically similar variant question texts, making potential factual conflicts easier to identify. By comparing the original answer text with multiple variant answer texts using a referee model independent of the tested model, the hallucination detection result is determined. This avoids missed detections due to knowledge bias during self-verification by the tested model, thereby improving the accuracy of hallucination detection. The method of this application improves the effectiveness of hallucination detection.
[0064] Figure 2 Flowchart of the hallucination detection method provided in this application Figure 2 ,like Figure 2 As shown, in this embodiment... Figure 1 Based on the examples, the hallucination detection method is described in detail, which includes:
[0065] S201. Obtain the original question text from the user and input the original question text into the model under test to obtain the output original answer text; wherein, the model under test is a large language model.
[0066] S202. Based on the original question text, identify multiple variant question texts; among them, the variant question texts are semantically similar to the original question text.
[0067] In one alternative implementation, step S202 may include:
[0068] Based on the original question text and the preset first prompt text, multiple variant question texts are determined using the referee model; the preset first prompt text is used to guide the referee model to output multiple variant question texts based on the original question text.
[0069] The preset first prompt text can refer to a fixed template string stored in the local configuration or database, which is used to drive the judge model to perform the task of outputting the original answer text. For example, the preset first prompt text includes instructions, examples and placeholders.
[0070] For example, the preset first prompt text could be:
[0071] "Use metamorphosis testing to generate 10 variations of a given problem. Ensure that the problem is rewritten using different vocabulary while retaining its original meaning. Examples are provided below for reference."
[0072] Example question: Was Johnny Depp born in Boston?
[0073] Example mutation problem:
[0074] 1. Is Johnny Depp from Boston?
[0075] 2. Is Boston Johnny Depp's birthplace?
[0076] 3. Was Johnny Depp born in Boston?
[0077] Real-world problem: {Original problem}
[0078] Mutation problem:
[0079] Return a list of numbered mutation issues directly. Do not add any content before or after the list.
[0080] Here, {original question} is a placeholder, which is used in actual applications to replace the actual original question text.
[0081] Specifically, by inputting the original question text and the preset first prompt text into the referee model, multiple variant question texts can be obtained from the output of the referee model.
[0082] It should be noted that the adjudication model used to output multiple variant problem texts and the adjudication model used for hallucination detection results can be the same model or different models, as long as they are both independent of the model being tested, thus maintaining the objectivity of the judgment perspective.
[0083] It is understandable that by using a referee model to identify multiple variant problem texts, a highly diverse and semantically preserved set of rewrites can be obtained at once without the need for manual maintenance of a thesaurus or rule base. This can improve the variant coverage rate, thereby providing richer comparative samples for subsequent consistency comparisons, improving the recall rate (i.e., the ability to capture hallucinations) and accuracy of hallucination detection, and thus improving the effectiveness of hallucination detection.
[0084] In an optional implementation, step S202 may further include:
[0085] Based on preset reconstruction and combination rules, the original problem text is reconstructed to generate multiple variant problem texts; among them, the preset reconstruction and combination rules include one or more of synonym replacement, syntax inversion, and word order rearrangement.
[0086] Among them, the preset reconstruction combination rules can refer to a set of text rewriting operators that are pre-installed in the local configuration file or rule base and can be executed offline. They can quickly generate semantically equivalent question variants without calling external models.
[0087] Synonym replacement can refer to using a pre-defined thesaurus or domain thesaurus to replace content words (nouns, verbs, adjectives) in the original question text with synonym entries according to their parts of speech, while keeping the other components unchanged.
[0088] Syntactic inversion refers to the interchange of active and passive voice, or the conversion of a subject-verb-object structure into an object-subject-verb structure, while maintaining the semantic roles. Specifically, a pre-defined syntactic inversion template can be used to invert the syntax of the original question text. For example, the original question text "Who wrote *Dream of the Red Chamber*?" can be syntactically inverted to "Who wrote *Dream of the Red Chamber*?"
[0089] Word order rearrangement refers to moving adverbial phrases, attributive phrases, or prepositional phrases in different positions without violating grammatical rules. Specifically, a preset word order rearrangement template can be used to rearrange the word order of the original question text. For example, the original question text "What is the temperature in City A today?" can be rearranged to "What is the temperature in City A today?".
[0090] It should be understood that the rules in the preset reconstruction and combination rules can be used individually or in combination, or in order such as "synonym replacement, syntax inversion, word order rearrangement" to further increase the diversity of variation. Each step of the combination is legally checked to ensure that the output is still a fluent question.
[0091] In one possible implementation, the pre-defined reconstruction and combination rules may include, but are not limited to, number form conversion, entity abbreviation expansion or contraction, negative sentence transformation, quantifier and unit replacement, etc.
[0092] Among them, the digital form conversion can refer to the mutual replacement of Arabic numerals and Chinese numerals, for example, replacing "2026" with "two thousand and twenty-six";
[0093] The expansion or abbreviation of entity abbreviations can refer to the mutual replacement of the abbreviated forms and full names of common institutions and technical terms, for example, replacing "LLM" with "Large Language Model";
[0094] The transformation of negative sentence patterns can refer to replacing negative words or interrogative words in interrogative sentences, for example, mutually transforming "whether to release" with "has it been released" with "is it released";
[0095] The replacement of quantifiers and units can refer to replacing measurement units or common quantifiers with synonymous or equivalent expressions, for example, replacing "how many degrees" with "how many degrees Celsius", and replacing "a few km" with "how many kilometers".
[0096] It can be understood that by determining multiple mutated problem texts based on preset reconstruction combination rules, diverse questions can be quickly generated offline without a network and without additional computing power, reducing the dependence on model calls, significantly improving the mutation efficiency and deployment flexibility, and thus meeting the requirements of high-concurrency and low-latency hallucination detection scenarios.
[0097] S203. Input each mutated problem text into the model under test, and obtain the mutated answer text corresponding to the mutated problem text as the output.
[0098] S204. For each mutated answer text, determine the mutated answer text and the original answer text as an answer pair, and based on the referee model, determine the evaluation result of the answer pair; among them, the evaluation result represents the degree of consistency between the mutated answer text and the original answer text.
[0099] Among them, an answer pair is obtained by combining a mutated answer text and an original answer text.
[0100] In an optional implementation manner, determining the evaluation result of the answer pair based on the referee model may include:
[0101] Based on the answer pair and the preset second prompt text, determine the evaluation result of the answer pair based on the referee model; among them, the preset second prompt text is used to guide the referee model to output the evaluation result of the answer pair.
[0102] Among them, the preset second prompt text can refer to a fixed template string stored in the local configuration or database, which is used to drive the referee model to execute the task of outputting the evaluation result of the answer pair. Exemplarily, the preset second prompt text includes instructions, output format requirements, and placeholders.
[0103] For example, the preset second prompt text can be:
[0104] Your task is to choose one output from the following three options: Yes, No, or Unsure.
[0105] Given two sentences, determine whether sentence A can be proven true based on sentence B.
[0106] Assume that sentence B is true in fact.
[0107] Please choose one of the following three values to answer:
[0108] Yes—Sentence A is completely proven to be true by sentence B.
[0109] No—Sentence A has been completely proven wrong by sentence B.
[0110] Uncertainty – Sentence B provides neither information to prove nor to refute sentence A.
[0111] Remember: Do not reply with anything other than "yes," "no," or "unsure." No explanations, reasons, or additional text are required.
[0112] Sentence A: {Sentence A}
[0113] Sentence B: {Sentence B}.
[0114] In this context, {Sentence A} and {Sentence B} are placeholders. In practical applications, {Sentence A} is used to replace the actual original answer text, and {Sentence B} is used to replace the actual variant answer text.
[0115] Specifically, by inputting the answer pair and the preset second prompt text into the judge model, the evaluation result of the answer pair output by the judge model can be obtained.
[0116] The evaluation result of the answer pair characterizes the degree of consistency between the variant answer text and the original answer text. Specifically, the evaluation result of the answer pair can refer to the discrete label of the factual consistency relationship between the variant answer text and the original answer text by the judge model.
[0117] In this example, the evaluation result of an answer pair is one of "Yes", "No", and "Uncertain". Specifically, an evaluation result of "Yes" indicates that the variant answer text fully supports the correctness of the original answer text both semantically and factually; an evaluation result of "No" indicates that the variant answer text conflicts with the original answer text or does not support its correctness at all; an evaluation result of "Uncertain" indicates that the variant answer text does not contain relevant information sufficient to determine the correctness of the original answer text.
[0118] It is understandable that by using a referee model to determine the evaluation results of correct answers, an external judgment perspective independent of the tested model can be introduced, avoiding self-verification misses caused by the knowledge bias of the tested model itself, thereby improving the accuracy and recall of hallucination detection, and thus improving the effectiveness of hallucination detection.
[0119] In one possible implementation, determining the evaluation result of a correct answer may also include:
[0120] For each answer pair, determine the vector representations corresponding to the mutated answer text and the original answer text, and determine the cosine similarity between the vector representations of the mutated answer text and the original answer text. The cosine similarity is then used as the evaluation result of the answer pair.
[0121] In this regard, a unified sentence vector encoding model can be used to input the mutated answer text and output the vector representation of the mutated answer text, as well as to input the original answer text and output the vector representation of the original answer text, so as to ensure the comparability of vectors in the same semantic space.
[0122] It is understandable that by using cosine similarity as the evaluation result, batch calculations can be completed in milliseconds without having to call large models to generate text labels again, thereby significantly reducing latency and computing costs, while maintaining high semantic discrimination accuracy and improving the real-time performance and economy of hallucination detection.
[0123] S205. Determine the hallucination detection results based on the evaluation results of each answer pair.
[0124] In one alternative implementation, step S205 may include:
[0125] S2051. For each evaluation result, based on the preset correlation, determine the evaluation score corresponding to the evaluation result; wherein, the evaluation score represents the quantitative value of the evaluation result's contribution to the hallucination.
[0126] The preset association can refer to a hard-coded mapping table stored in the local configuration, which is used to convert discrete evaluation result labels into continuous or discrete quantitative scores to ensure that the relative weight of different labels to the contribution of hallucination is consistent.
[0127] The evaluation score can be a floating-point number or an integer. Its value is positively correlated with the likelihood that the original answer text is a hallucination. The higher the evaluation score, the more likely the original answer text is to indicate a hallucination.
[0128] For example, the preset associations include:
[0129] a. When the evaluation result is "yes", the evaluation score for the correct answer is 0;
[0130] b. When the evaluation result is "No", the evaluation score for the correct answer is 1;
[0131] c. When the evaluation result is "uncertain", the score for the correct answer is 0.5.
[0132] It is understandable that, based on pre-defined associations, interpretable and comparable quantitative indicators of the evaluation results of answer pairs can be quickly obtained without additional training, providing a unified scale for subsequent processing steps.
[0133] S2052. Determine the hallucination score based on each evaluation score; whereby the hallucination score represents the probability that the original answer text is a hallucination.
[0134] Among them, the illusion score can refer to the aggregated index used to comprehensively evaluate the consistency level of all answer pairs. It quantifies the discrete evaluation results into a single continuous value, which is convenient for subsequent threshold determination.
[0135] For example, the hallucination score is the arithmetic mean of all evaluation scores, ranging from [0,1]. A higher value indicates a higher probability that the original answer text was judged as a hallucination. For instance, the hallucination score satisfies:
[0136] ,
[0137] in, Indicates the hallucination score, This indicates the number of evaluation scores (i.e., the number of correct or variant answer texts). This represents the i-th evaluation score.
[0138] It is understandable that using the arithmetic mean can smooth out the random error of a single judgment while maintaining monotonicity, making the hallucination score repeatable and interpretable, and can be directly compared with a unified judgment threshold, thereby achieving fast, stable and more accurate hallucination detection.
[0139] S2053. Determine the hallucination detection result based on the hallucination score.
[0140] In an optional implementation, step S2053 may include:
[0141] Determine the judgment threshold; if the hallucination score is greater than or equal to the judgment threshold, then the hallucination detection result is determined to represent the original answer text as hallucination text.
[0142] The judgment threshold refers to the boundary value for converting the hallucination score into a Boolean judgment result. If the hallucination score is greater than or equal to the judgment threshold, it indicates that the overall consistency between the original answer text and each variant answer text is low and there is a significant factual conflict. In this case, the hallucination detection result indicates that the original answer text is a hallucination text. If the hallucination score is less than the judgment threshold, it indicates that the overall consistency between the original answer text and each variant answer text is acceptable and the factual conflict does not reach a significant level. In this case, the hallucination detection result indicates that the original answer text is not a hallucination text.
[0143] For example, the judgment threshold can be preset by staff based on relevant experience, such as a judgment threshold of 0.5.
[0144] In one possible implementation, the determination threshold can also be:
[0145] Based on the original question text, determine the question domain type; based on the question domain type, determine the judgment threshold.
[0146] Specifically, the implementing entity of this application can identify the domain of the original question text through keyword matching or a lightweight classification model to obtain the question domain types such as "medical", "law", "finance", "education" and "open domain". Then, it queries the pre-set "domain-threshold" mapping table and automatically selects the corresponding judgment threshold. This mapping table is configured offline by the operation and maintenance personnel according to the domain error tolerance and can be hot-updated.
[0147] For example, if the problem domain type is "medical", representing that factual errors may endanger life and health, the judgment threshold in the mapping table can be configured to be lower, such as 0.3, to improve the recall rate. If the problem domain type is "open domain", representing that the error tolerance is relatively high, the judgment threshold in the mapping table can be configured to be higher, such as 0.6, to reduce the false positive rate and reduce unnecessary manual review.
[0148] It is understandable that by using a dynamic threshold mechanism driven by the problem domain type, stricter judgment criteria can be adopted in high-risk domains to reduce missed detections; while the threshold can be relaxed in low-risk domains to reduce false alarms. This allows for a balance between security and user experience in multi-domain deployment scenarios, thereby improving the practicality and operability of hallucination detection methods.
[0149] It should be understood that the range of values for hallucination detection results and the corresponding judgment thresholds can be flexibly set according to actual business needs, but the judgment logic always maintains a monotonic mapping relationship that the higher the score, the greater the likelihood of hallucination.
[0150] It is understandable that by determining an accurate threshold to convert hallucination scores into operational Boolean signals to obtain hallucination detection results, the false false detection rate in high-risk areas can be effectively reduced, thereby improving the security and reliability of hallucination detection.
[0151] It is understandable that by quantifying the evaluation results into evaluation scores to determine the hallucination score and then the hallucination detection result, the overall detection stability and repeatability can be improved, achieving high-precision and interpretable hallucination detection, thereby improving the effectiveness of hallucination detection.
[0152] It is understandable that by determining the evaluation results of each answer pair to determine the hallucination detection results, a comprehensive multi-angle detection can be carried out, thereby more comprehensively discovering potential factual errors in the original answer text, improving the accuracy and recall of hallucination detection, and thus improving the effectiveness of hallucination detection.
[0153] The method described in this application improves the effectiveness of hallucination detection.
[0154] Figure 3 This is a schematic diagram of the architecture of the hallucination detection method provided in this application. This embodiment will be described in conjunction with the foregoing embodiments, such as... Figure 3 As shown, this embodiment of the application constructs a dual-model verification loop through a referee model independent of the model under test.
[0155] Specifically, firstly (at the beginning of the architecture), the original question text asked by the user (which can be represented as the input question) is processed. The input is fed into the test model (also known as the large test model), and the test model outputs the original answer text corresponding to the original question text (which can be represented as the initial answer). ), and at the same time, input questions The input is fed into the judge model (also known as the large judge model), and the large judge model outputs the input problem. The corresponding multiple mutation problem texts (also known as mutation problems, which can be represented as) Furthermore, by inputting multiple variant question texts into the tested large model, the variant answer text (also known as the variant answer, which can be represented as) corresponding to each variant question text can be obtained. Furthermore, the initial answer And each variant answer Form a pair of answers ( , Each answer pair is input into the large-scale judging model for comparison and evaluation. The evaluation result of each answer pair is obtained from the large-scale judging model (also known as the evaluation consistency output). The evaluation result of the answer pair can be one of "Yes", "No" or "Not Sure". Further, based on the evaluation result of each answer pair, the hallucination score is calculated (also known as the hallucination score), and based on the hallucination score, the hallucination detection result (also known as the hallucination judgment result) is determined.
[0156] The method described in this application improves the effectiveness of hallucination detection.
[0157] Figure 4 This is a schematic diagram of the hallucination detection device provided in this application, as shown below. Figure 4 As shown, the hallucination detection device 40 provided in this embodiment includes: an acquisition module 401, a determination module 402, and a detection module 403.
[0158] The acquisition module 401 is used to acquire the original question text from the user and input the original question text into the model under test to obtain the output original answer text; wherein, the model under test is a large language model;
[0159] The determination module 402 is used to determine multiple variant question texts based on the original question text, and input each variant question text into the model under test to obtain the output variant answer texts that correspond one-to-one with the variant question texts; wherein, the variant question texts are semantically similar to the original question texts.
[0160] The detection module 403 is used to determine the hallucination detection result based on the original answer text, each variant answer text and the referee model; wherein, the referee model is a pre-trained large language model and is independent of the tested model, and the hallucination detection result represents the hallucination situation of the output text of the tested model.
[0161] In an optional example, the determination module 402 is further configured to determine multiple variant question texts based on the original question text and a preset first prompt text, using the referee model; wherein the preset first prompt text is used to guide the referee model to output multiple variant question texts based on the original question text.
[0162] In an optional example, the determination module 402 is also used to reconstruct the original problem text based on preset reconstruction combination rules to generate multiple variant problem texts; wherein the preset reconstruction combination rules include one or more of synonym replacement, syntactic inversion, and word order rearrangement.
[0163] In an optional example, the detection module 403 is further configured to, for each variant answer text, determine that the variant answer text and the original answer text are an answer pair, and determine the evaluation result of the answer pair based on the referee model; wherein the evaluation result characterizes the degree of consistency between the variant answer text and the original answer text;
[0164] The hallucination detection results are determined based on the evaluation results of each answer pair.
[0165] In an optional example, the detection module 403 is further configured to determine the evaluation result of the answer pair based on the judge model, according to the answer pair and the preset second prompt text; wherein the preset second prompt text is used to guide the judge model to output the evaluation result of the answer pair.
[0166] In an optional example, the detection module 403 is further configured to determine the evaluation score corresponding to each evaluation result based on a preset correlation relationship; wherein the evaluation score represents the quantitative value of the contribution of the evaluation result to the hallucination.
[0167] Based on the evaluation scores, a hallucination score is determined; the hallucination score represents the probability that the original answer text is a hallucination.
[0168] The hallucination test results are determined based on the hallucination score.
[0169] In an optional example, the detection module 403 is also used to determine a decision threshold;
[0170] If the hallucination score is greater than or equal to the judgment threshold, then the hallucination detection result is determined to represent the original answer text as hallucination text.
[0171] The hallucination detection device provided in this embodiment can execute the method provided in the above method embodiment. Its implementation principle and technical effect are similar, and will not be described in detail here.
[0172] Figure 5 A schematic diagram of the structure of the electronic device provided in this application. Figure 5 As shown, the electronic device 50 provided in this embodiment includes at least one processor 501 and a memory 502. Optionally, the electronic device 50 further includes a communication component 503. The processor 501, memory 502, and communication component 503 are connected via a bus 504.
[0173] In a specific implementation, at least one processor 501 executes computer execution instructions stored in memory 502, causing at least one processor 501 to perform the above-described method.
[0174] The specific implementation process of processor 501 can be found in the above method embodiments, and its implementation principle and technical effect are similar. It will not be repeated here.
[0175] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.
[0176] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.
[0177] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.
[0178] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0179] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the above-described method.
[0180] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.
[0181] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in the device.
[0182] The division of units is merely a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.
[0183] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0184] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0185] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0186] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.
[0187] Finally, it should be noted that other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein, and is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.
Claims
1. A hallucination detection method, characterized by, include: The system obtains the original question text from the user and inputs it into the model under test to obtain the output original answer text; wherein, the model under test is a large language model. Based on the original question text, multiple variant question texts are determined, and each variant question text is input into the model under test to obtain a variant answer text that corresponds one-to-one with the variant question text; wherein, the variant question text is semantically similar to the original question text; Based on the original answer text, each of the variant answer texts, and the referee model, the hallucination detection result is determined; wherein, the referee model is a large language model and is independent of the tested model, and the hallucination detection result characterizes the hallucination status of the output text of the tested model.
2. The method according to claim 1, characterized in that, Based on the original question text, several variant question texts are identified, including: Based on the original question text and the preset first prompt text, multiple variant question texts are determined according to the referee model; wherein, the preset first prompt text is used to guide the referee model to output multiple variant question texts based on the original question text.
3. The method according to claim 1, characterized in that, Based on the original question text, several variant question texts are identified, including: Based on preset reconstruction and combination rules, the original problem text is reconstructed to generate multiple variant problem texts; wherein, the preset reconstruction and combination rules include one or more of synonym replacement, syntactic inversion, and word order rearrangement.
4. The method according to any one of claims 1-3, characterized in that, Based on the original answer text, each of the variant answer texts, and the referee model, the hallucination detection result is determined, including: For each of the mutated answer texts, the mutated answer text and the original answer text are identified as an answer pair, and an evaluation result for the answer pair is determined based on the judging model; wherein the evaluation result characterizes the degree of consistency between the mutated answer text and the original answer text; The hallucination detection result is determined based on the evaluation results of each of the stated answer pairs.
5. The method according to claim 4, characterized in that, Based on the aforementioned judging model, the evaluation result of the answer pair is determined, including: Based on the answer pair and the preset second prompt text, and using the referee model, the evaluation result of the answer pair is determined; wherein, the preset second prompt text is used to guide the referee model to output the evaluation result of the answer pair.
6. The method according to claim 4, characterized in that, The hallucination detection result is determined based on the evaluation results of each pair of answers, including: For each evaluation result, an evaluation score is determined based on a preset correlation; wherein the evaluation score represents the quantitative value of the evaluation result's contribution to the hallucination. Based on the evaluation scores, a hallucination score is determined; wherein, the hallucination score represents the probability that the original answer text is a hallucination; The hallucination detection result is determined based on the hallucination score.
7. The method according to claim 6, characterized in that, The hallucination detection result is determined based on the hallucination score, including: Determine the judgment threshold; If the hallucination score is greater than or equal to the determination threshold, then the hallucination detection result is determined to represent the original answer text as hallucination text.
8. A hallucination detection device, characterized in that, include: The acquisition module is used to acquire the original question text from the user and input the original question text into the model under test to obtain the output original answer text; wherein, the model under test is a large language model; The determination module is used to determine multiple variant question texts based on the original question text, and input each variant question text into the tested model to obtain a variant answer text that corresponds one-to-one with the variant question text; wherein, the variant question text is semantically similar to the original question text; The detection module is used to determine the hallucination detection result based on the original answer text, each of the variant answer texts, and the referee model; wherein the referee model is a pre-trained large language model and is independent of the tested model, and the hallucination detection result characterizes the hallucination status of the output text of the tested model.
9. An electronic device, characterized in that, include: Memory, processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-7.
11. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method described in any one of claims 1-7.