Multimodal large model hallucination attack method and device based on text reference interference

By generating and color-interference-processed hallucination-triggered images, large-scale image recognition models are used for detection, which solves the problem of color recognition errors caused by textual reference interference and improves the comprehensiveness and reliability of hallucination detection.

CN122223367APending Publication Date: 2026-06-16BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2026-03-05
Publication Date
2026-06-16

Smart Images

  • Figure CN122223367A_ABST
    Figure CN122223367A_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure disclose a method and device for hallucination attack on a multi-modal large model based on text reference interference. A specific embodiment of the method comprises: obtaining a first hallucination interference question text; generating a first interference answer text based on a preset color text information base; performing color rendering interference processing on the first interference answer text to obtain a first color interference answer text; generating a first hallucination interference text based on the first hallucination interference question text and the first color interference answer text; performing hallucination trigger image generation processing on the first hallucination interference text to obtain a first question hallucination trigger image; performing double color interference processing on the first question hallucination trigger image to obtain a first hallucination trigger image; performing hallucination detection processing on a preset image recognition large model to obtain first answer hallucination detection information and first text hallucination detection information; and generating hallucination detection score information based on the first answer hallucination detection information and the first text hallucination detection information. The embodiment improves the comprehensiveness and reliability of the hallucination attack on the multi-modal large model based on the text reference interference.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments disclosed herein relate to the field of computer technology, and specifically to a method and apparatus for multimodal large-scale illusion attacks based on textual referential interference. Background Technology

[0002] With the widespread application of large-scale image recognition models in key fields such as medical image analysis, autonomous driving, and security monitoring, hallucination detection using these models requires assessing the reliability of the model's output compared to reality. Large-scale image recognition hallucination detection is a technique for evaluating and verifying the reliability of the model's output. Currently, the common methods for hallucination detection using large models are: thresholding the confidence score of the model's output or evaluating accuracy using a standard test set.

[0003] However, when using the above method to perform hallucination detection on large image recognition models, the following technical problems often arise: Hallucination detection can be performed by thresholding the confidence scores output by large image recognition models or by using only standard test sets for accuracy evaluation. Typically, images in the standard test sets do not induce modal conflict, and the model's hallucination detection results may appear normal. However, when performing hallucination detection on images with specific textual references, color recognition under modal conflict is easily interfered with by both textual bias and dynamic decision imbalance. Large image recognition models commonly exhibit the phenomenon of "blindly trusting the text in the image." Specific textual references in the image can interfere with the large image recognition model's ability to recognize and perceive colors, causing hallucinations and potentially resulting in abnormal hallucination detection results. Using standard test sets for accuracy evaluation for hallucination detection has flaws; hallucination detection is relatively one-sided, leading to low comprehensiveness and low reliability of hallucination detection results from large image recognition models in real-world applications.

[0004] The information disclosed in this background section is only intended to enhance the understanding of the background of the inventive concept, and therefore may contain information that does not form prior art known to those skilled in the art. Summary of the Invention

[0005] The summary portion of this disclosure is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description portion. This summary portion is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.

[0006] Some embodiments of this disclosure provide methods, apparatus, electronic devices, and computer-readable media for detecting large-scale image recognition illusions to address one or more of the technical problems mentioned in the background section above.

[0007] In a first aspect, some embodiments of this disclosure provide a multimodal large-scale illusion attack method based on textual referential interference. The method includes: acquiring a first illusion interference question text; generating a first interference answer text based on a preset color text information library; performing color rendering interference processing on the first interference answer text based on the preset color information library to obtain a first color interference answer text; generating a first illusion interference text based on the first illusion interference question text and the first color interference answer text; performing illusion trigger image generation processing on the first illusion interference text to obtain a first question illusion trigger image; performing dual color interference processing on the first question illusion trigger image to obtain a first illusion trigger image; performing illusion detection processing on a preset image recognition large-scale model based on the first question illusion trigger image and the first illusion trigger image to obtain first answer illusion detection information and first text illusion detection information; and generating illusion detection scoring information based on the first answer illusion detection information and the first text illusion detection information.

[0008] Secondly, some embodiments of this disclosure provide a multimodal large-scale illusion attack device based on textual referential interference. The device includes: an acquisition unit configured to acquire first illusion interference question text; a first generation unit configured to generate first interference answer text based on a preset color text information library; a color rendering interference processing unit configured to perform color rendering interference processing on the first interference answer text based on a preset color information library to obtain first color interference answer text; a second generation unit configured to generate first illusion interference text based on the first illusion interference question text and the first color interference answer text; and an illusion trigger image generation processing unit. The first hallucination interference text is configured to perform hallucination trigger image generation processing on the first hallucination interference text to obtain a first question hallucination trigger image; the dual color interference processing unit is configured to perform dual color interference processing on the first question hallucination trigger image to obtain a first hallucination trigger image; the hallucination detection processing unit is configured to perform hallucination detection processing on a preset image recognition large model based on the first question hallucination trigger image and the first hallucination trigger image to obtain first answer hallucination detection information and first text hallucination detection information; the third generation unit is configured to generate hallucination detection scoring information based on the first answer hallucination detection information and the first text hallucination detection information.

[0009] Thirdly, some embodiments of this disclosure provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any implementation of the first aspect above.

[0010] Fourthly, some embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

[0011] The above-described embodiments of this disclosure have the following beneficial effects: the multimodal large model illusion attack method based on textual reference interference in some embodiments of this disclosure improves the comprehensiveness and reliability of multimodal large model illusion attacks based on textual reference interference. Specifically, the reason why the results of multimodal large model illusion attacks based on textual reference interference have low comprehensiveness and low reliability in real applications is that: when performing illusion detection by thresholding the confidence score output by the image recognition large model or by only using a standard test set for accuracy evaluation, the images in the standard test set usually cannot cause modal conflict, and the illusion detection result output by the model may be normal. However, when performing illusion detection on images with special textual references, color recognition under modal conflict is easily interfered with by both textual bias and dynamic decision imbalance. Image recognition large models generally have the phenomenon of "blindly believing the text in the image." Special textual references in the image can interfere with the image recognition large model's ability to recognize and perceive colors, causing it to produce illusions, and the illusion detection result may be abnormal. The accuracy evaluation of hallucination detection using a standard test set has flaws, resulting in a one-sided hallucination detection approach. This leads to low comprehensiveness and reliability of hallucination detection results in large-scale image recognition models in real-world applications. Therefore, this disclosure presents a multimodal large-scale hallucination attack method based on textual referential interference. First, a first hallucination interference question text is obtained. Then, based on a preset color text information library, a first interference answer text is generated. This yields a first interference answer text whose meaning represents a certain color. Next, based on the preset color information library, the first interference answer text undergoes color rendering interference processing to obtain a first color interference answer text. This yields a first color interference answer text rendered with a color different from its literal meaning, inducing semantic referential hallucinations in the model and testing whether the model is misled by the literal meaning of the text. Then, based on the first hallucination interference question text and the first color interference answer text, a first hallucination interference text is generated. This yields a first hallucination interference text used to generate a first answer hallucination trigger image. Finally, the first hallucination interference text undergoes hallucination trigger image generation processing to obtain the first answer hallucination trigger image. Thus, a first answer illusion trigger image can be obtained to induce semantic referential illusion in the model and test whether the model is misled by the literal meaning of the text. Next, the first answer illusion trigger image is subjected to dual color interference processing to obtain a first illusion trigger image. This yields a first illusion trigger image with dual color interference to test whether the model can distinguish between the "color of the question itself" and the "color of the target the question points to." Subsequently, based on the first answer illusion trigger image and the first illusion trigger image, a pre-defined image recognition large-scale model is subjected to illusion detection processing to obtain first answer illusion detection information and first text illusion detection information.Therefore, the model's color recognition and perception capabilities can be scored based on the first answer hallucination detection information and the first text hallucination detection information. Finally, based on the aforementioned first answer hallucination detection information and the aforementioned first text hallucination detection information, hallucination detection score information is generated. Thus, hallucination detection score information characterizing the model's color recognition and perception capabilities on hallucinogenic images can be obtained. Because images capable of inducing semantic referential hallucinations are used for hallucination detection, and dual color interference is introduced, the model's ability to distinguish between the "color of the question itself" and the "color of the target the question points to" is tested. Using different types of images capable of inducing hallucinations expands the detection coverage, making hallucination detection more comprehensive. This avoids the potential masking issues that can arise from using only a standard test set that cannot induce modal conflict for hallucination detection, and more effectively exposes errors that the model may make in practical applications. Furthermore, this improves the comprehensiveness and reliability of multimodal large-model hallucination attacks based on text referential interference. Attached Figure Description

[0012] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and elements are not necessarily drawn to scale.

[0013] Figure 1 This is a flowchart of some embodiments of the multimodal large-model illusion attack method based on textual referential interference according to the present disclosure; Figure 2 This is a schematic diagram of the structure of some embodiments of the multimodal large-scale illusion attack device based on textual referential interference according to the present disclosure; Figure 3 This is a schematic diagram of the structure of an electronic device suitable for implementing some embodiments of the present disclosure. Detailed Implementation

[0014] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0015] It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.

[0016] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0017] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0018] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0019] This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.

[0020] Figure 1 The flowchart 100 illustrates some embodiments of a multimodal large model illusion attack method based on textual referential interference according to this disclosure. This multimodal large model illusion attack method based on textual referential interference includes the following steps: Step 101: Obtain the text of the first hallucination interference problem.

[0021] In some embodiments, the execution subject (e.g., a computing device) of the multimodal large model illusion attack method based on text referencing interference can acquire a first illusion interference question text pre-stored in a storage device. In practice, the execution subject can acquire the first illusion interference question text pre-stored in a storage device. This first illusion interference question text can be text used to trigger the model to recognize the color of text in an image. The storage device can be a hard disk drive, solid-state drive, etc., connected to the execution subject. For example, the first illusion interference question text could be "What color is the text?". Step 102: Generate the first distracting answer text based on the preset color text information database.

[0022] In some embodiments, the executing entity may generate a first distracting answer text based on a preset color text information library. In practice, the executing entity may generate the first distracting answer text based on a preset color text information library.

[0023] In some optional implementations of certain embodiments, the aforementioned execution entity may generate the first interfering answer text based on a preset color text information library through the following steps: The first step is to randomly extract color text information from a pre-defined color text information library. In practice, the execution entity can use the `sample` function to randomly extract color text information from the pre-defined color text information library. This color text information can be English words representing colors. The pre-defined color text information library can be a color database composed of English words representing various colors. For example, the color text information could be "yellow". Alternatively, the pre-defined color text information library could be CSS color names.

[0024] The second step is to identify the extracted color-coded text information as the first distractor answer text. In practice, the aforementioned implementing entity can identify the extracted color-coded text information as the first distractor answer text.

[0025] Step 103: Based on the preset color information library, perform color rendering interference processing on the first interference answer text to obtain the first color interference answer text.

[0026] In some embodiments, the execution entity may perform color rendering interference processing on the first interference answer text based on a preset color information library to obtain a first color interference answer text. The preset color information library may be a library composed of CSS color codes. Each color text information in the preset color text information library corresponds to each color information in the preset color information library.

[0027] In some optional implementations of certain embodiments, the aforementioned execution entity may perform color rendering interference processing on the aforementioned first interference answer text based on a preset color information library through the following steps to obtain the first color interference answer text: The first step is to determine the color information corresponding to the extracted color text information as the target color information. In practice, the aforementioned executing entity can determine the color information corresponding to the extracted color text information as the target color information. This color information can be a code representing a color. As an example, the extracted color text information can be yellow. The color information corresponding to the extracted color text information can be #FFFF00. The target color information can be #FFFF00 (i.e., yellow).

[0028] The second step involves updating the preset color information library based on the target color information to obtain an updated color information library. In practice, the executing entity can delete the target color information from the preset color information library to obtain the updated color information library. The updated color information library can be a color information library that is missing one color (i.e., the target color information) after deleting the target color information from the preset color library.

[0029] The third step is to randomly extract color information from the updated color information database. In practice, the executing entity can use the `sample` function to randomly extract color information from the updated color information database.

[0030] The fourth step is to identify the extracted color information as interference color information. In practice, the executing entity can identify the extracted color information as interference color information. The extracted color information can be any color information in the preset color information library other than the target color information. For example, the interference color information could be #800080 (i.e., purple).

[0031] Fifth, based on the aforementioned interfering color information, perform color rendering interference processing on the first interfering answer text to obtain the first color interfering answer text. In practice, the executing entity can set the CSS `color` property according to the interfering color information to obtain the first color interfering answer text. The first color interfering answer text can be a non-black first interfering answer text rendered with the color represented by the aforementioned interfering color information. For example, the first color interfering answer text can be a purple first interfering answer text.

[0032] Step 104: Generate the first illusion interference text based on the first illusion interference question text and the first color interference answer text.

[0033] In some embodiments, the execution entity can generate the first illusion interference text based on the first illusion interference question text and the first color interference answer text. In practice, the execution entity can combine the first illusion interference question text and the first color interference answer text to obtain the first illusion interference text. The first illusion interference text can guide attention to the color attribute of the first illusion interference question text, and the first color interference answer text, whose semantics do not match its visual color, can cause color recognition interference to the large model.

[0034] Step 105: Perform hallucination trigger image generation processing on the first hallucination interference text to obtain the first problem hallucination trigger image.

[0035] In some embodiments, the aforementioned executing entity may perform hallucination trigger image generation processing on the aforementioned first hallucination interference text to obtain a first answer hallucination trigger image.

[0036] In some optional implementations of certain embodiments, the aforementioned execution entity may perform hallucination trigger image generation processing on the aforementioned first hallucination interference text through the following steps to obtain the first answer hallucination trigger image.

[0037] The first step is to perform random and diverse typesetting on the aforementioned first illusionary interference text to obtain random illusionary interference text. In practice, the executing entity can use JavaScript's `Math.random()` function to generate random numbers and set CSS properties such as `transform`, `letter-spacing`, `line-height`, and `font-size` (i.e., letter spacing, line spacing, and font size) to obtain random illusionary interference text. This random illusionary interference text can be text with highly random and diverse typesetting features (such as letter spacing, line spacing, and font size) obtained after performing random and diverse typesetting on the first illusionary interference text.

[0038] The second step involves coordinate mapping of the random illusion interference text to obtain a mapped illusion interference image. In practice, the execution entity first uses a Canvas to establish a three-dimensional canvas matrix (H, W, 3), where H represents the canvas height (in pixels), ranging from [300, 2000]. W represents the canvas width (in pixels), ranging from [400, 3000]. 3 represents the number of color channels, corresponding to the RGB three-channel color space. Then, the `Array.from(text).length` method is used to process the random illusion interference text to obtain the total number of characters. Next, the `measureText()` function is used to process the random illusion interference text to obtain the text width. Then, the `QFontMetrics` method is used to obtain the text height. Finally, half the difference between H (canvas height) and the text height is determined as the vertical center offset. Half the difference between W (canvas width) and the text width is determined as the horizontal center offset. Finally, based on the aforementioned vertical and horizontal center offsets, the `fillText()` method is used to render the random illusionary interference text, resulting in a mapped illusionary interference image. The aforementioned three-dimensional canvas matrix can be a logical representation of the canvas pixel data, with dimensions of height H, width W, and 3 RGB color channels. The aforementioned mapped illusionary interference image can be an image containing randomly positioned and randomly styled rendered characters to ensure the text doesn't get too close to the canvas edges; the format can be PNG or JPEG. The aforementioned vertical center offset represents the adjustment amount for centering the text vertically on the canvas. The aforementioned horizontal center offset represents the adjustment amount for centering the text horizontally on the canvas.

[0039] The third step involves smoothing the font edges of the aforementioned mapped illusion interference image to obtain a smoothed illusion interference image. In practice, the executing entity can use a Gaussian blur method to smooth the font edges of the mapped illusion interference image to obtain a smoothed illusion interference image. This smoothed illusion interference image can be an image with reduced noise after applying Gaussian blur to the mapped illusion interference image.

[0040] The fourth step is to identify the smoothed hallucination interference image as the first answer hallucination trigger image.

[0041] Step 106: Perform dual-color interference processing on the first hallucination trigger image to obtain the first hallucination trigger image.

[0042] In some embodiments, the aforementioned executing entity may perform dual-color interference processing on the aforementioned first answer hallucination trigger image to obtain the first hallucination trigger image.

[0043] In some optional implementations of certain embodiments, the aforementioned execution entity may perform dual-color interference processing on the aforementioned first answer hallucination trigger image through the following steps to obtain the first hallucination trigger image: The first step involves updating the color information library based on the aforementioned interfering color information, resulting in an updated color information library. In practice, the executing entity can remove the interfering color information from the updated color information library to obtain the updated color information library. Specifically, the updated color information library can be a color information library that is missing two colors (i.e., the target color information and the interfering color information) after removing the target color information and the interfering color information from the preset color information library. For example, the target color information could be #FFFF00 (yellow). The interfering color information could be #800080 (purple). The updated color information library could then be a color information library that does not contain yellow or purple.

[0044] The second step is to randomly extract color information from the updated color information library. In practice, the executing entity can use the `sample` function to randomly extract color information from the updated color information library. The randomly extracted color information can be any color from the preset color information library other than the target color information and the interference color information. For example, the color information could be #FF0000 (i.e., red).

[0045] The third step is to identify the extracted color information as the second interference color information.

[0046] Fourth, based on the aforementioned second interference color information, the aforementioned first answer hallucination trigger image is subjected to dual color interference processing to obtain the first hallucination trigger image. In practice, the aforementioned executing entity can use Canva to render the first hallucination interference question text included in the aforementioned first answer hallucination trigger image into the color represented by the aforementioned second interference color information to obtain the first hallucination trigger image. The aforementioned first hallucination trigger image can be an image with introduced dual color interference, where the color rendered by the first color interference answer text is different from the color rendered by the first hallucination interference question text, and the color rendered by the first color interference answer text, the color rendered by the first hallucination interference question text, and the color represented by the meaning of the first color interference answer text itself are also different. As an example, the color rendered by the aforementioned first color interference answer text can be purple. The color rendered by the aforementioned first hallucination interference question text can be red. The color represented by the meaning of the aforementioned first color interference answer text itself can be yellow.

[0047] Step 107: Based on the first question hallucination trigger image and the first hallucination trigger image, perform hallucination detection processing on the preset image recognition large model to obtain the first answer hallucination detection information and the first text hallucination detection information.

[0048] In some embodiments, the execution entity may perform hallucination detection processing on a preset image recognition big model based on the first answer hallucination trigger image and the first hallucination trigger image to obtain first answer hallucination detection information and first text hallucination detection information.

[0049] In some optional implementations of certain embodiments, the aforementioned execution entity may perform hallucination detection processing on a preset image recognition large model based on the aforementioned first answer hallucination trigger image and the aforementioned first hallucination trigger image to obtain first answer hallucination detection information and first text hallucination detection information: The first step involves inputting the aforementioned first answer hallucination trigger image and the aforementioned first hallucination trigger image into a preset image recognition model to obtain answer image recognition information and image recognition information. In practice, firstly, the aforementioned executing entity can input the aforementioned first answer hallucination trigger image into the preset image recognition model to obtain answer image recognition information. Then, the aforementioned first hallucination trigger image is input into the preset image recognition model to obtain image recognition information. The aforementioned preset image recognition model can be a large model capable of image recognition, taking the aforementioned first answer hallucination trigger image and the aforementioned first hallucination trigger image as input information, and outputting answer image recognition information and image recognition information. The aforementioned answer image recognition information can be the color recognition information of the aforementioned first answer hallucination trigger image output by the aforementioned preset image recognition model. The aforementioned image recognition information can be the color recognition information of the aforementioned first hallucination trigger image output by the aforementioned preset image recognition model. For example, the aforementioned preset image recognition model can be a bean bun. For example, the aforementioned answer image recognition information can be that the color of the text in the image is purple. The aforementioned image recognition information can be that the color of the text at the top of the image is red, and the color of the text at the bottom of the image is purple.

[0050] The second step involves performing the following steps on the above-mentioned answer image recognition information, based on the aforementioned interfering color information: The first sub-step involves determining that the aforementioned interference color information and the aforementioned answer image recognition information are the same, and then defining the preset correct recognition information as the first answer illusion detection information. Here, the preset correct recognition information can indicate that the answer image recognition information output by the model (i.e., the recognition result of the image text color) matches the text color of the input first answer illusion trigger image, thus indicating correct recognition. For example, the preset correct recognition information could simply be "correct recognition".

[0051] The second sub-step, in response to the determination that the aforementioned interference color information and the aforementioned answer image recognition information are different, determines the preset recognition error information as the first answer illusion detection information. Here, the preset recognition error information may indicate that the answer image recognition information output by the model (i.e., the recognition result of the image text color) does not match the text color of the input first illusion trigger image, resulting in a recognition error. For example, the preset recognition error information may simply be a recognition error.

[0052] Third, based on the aforementioned interfering color information, perform the following steps on the image recognition information: The first sub-step is to determine, in response to the determination that the above-mentioned interference color information and the above-mentioned image recognition information are the same, to determine the preset correctly recognized information as the first text illusion detection information.

[0053] The second sub-step involves determining, in response to the fact that the aforementioned interfering color information and the aforementioned image recognition information are different, that the preset recognition error information is determined as the first text illusion detection information.

[0054] Step 108: Based on the first answer hallucination detection information and the first text hallucination detection information, generate hallucination detection score information.

[0055] In some embodiments, the executing entity may generate hallucination detection scoring information based on first answer hallucination detection information and first text hallucination detection information. In practice, firstly, in response to determining that the first answer hallucination detection information is correctly identified, the executing entity may determine a preset first value as the first question hallucination detection score. Then, in response to determining that the first answer hallucination detection information is incorrectly identified, a preset second value may be determined as the first question hallucination detection score. Next, in response to determining that the first text hallucination detection information is correctly identified, a preset first value may be determined as the first text hallucination detection score. Then, in response to determining that the first text hallucination detection information is incorrectly identified, a preset second value may be determined as the first text hallucination detection score. Afterwards, the product of a preset first question hallucination detection weight and the first question hallucination detection score is determined as a first score. Then, the product of a preset first text hallucination detection weight and the first text hallucination detection score is determined as a second score. Finally, the sum of the first score and the second score is determined as the hallucination detection scoring information. The aforementioned hallucination detection score represents the overall risk level of the preset image recognition model generating hallucinations from the input image. A higher score indicates a greater likelihood that the model output contains hallucinations (i.e., factual errors, fabrications, or inconsistencies). The preset first value represents a score indicating a correct color recognition result. The preset second value represents a score indicating an incorrect color recognition result. The preset first question hallucination detection weight represents the importance of the first answer hallucination detection information in generating the hallucination detection score. The preset first text hallucination detection weight represents the importance of the first text hallucination detection information in generating the hallucination detection score. For example, the preset first question hallucination detection weight can be 0.4. The preset first text hallucination detection weight can be 0.6. The preset first value can be 0. The preset second value can be 100 points.

[0056] In some optional implementations of certain embodiments, the aforementioned execution entity may perform the following steps: The first step involves updating the preset image recognition model based on the aforementioned hallucination detection scoring information, resulting in an updated image recognition model. In practice, the executing entity may, in response to determining that the score represented by the aforementioned hallucination detection scoring information is less than a preset update threshold, add a preset number of fully connected layers to the end of the preset image recognition model to update it, thus obtaining the updated image recognition model. The updated image recognition model can be a preset image recognition model with stronger robustness to color interference (i.e., "color anti-interference ability"). The preset update threshold can be a boundary value that triggers the model update process. The preset number can represent the number of fully connected layers added. For example, the preset update threshold can be 60. The preset number can be 3.

[0057] The second step is to acquire the image to be identified. In practice, the aforementioned execution entity can acquire the image to be identified through a wired or wireless connection. It should be noted that the aforementioned wireless connection methods can include, but are not limited to, 3G / 4G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (ultra-wideband) connections, and other currently known or future known wireless connection methods. The image to be identified can be a natural image or a synthetic image that has been subjected to specific color interference (such as hue shift, saturation / brightness abrupt changes, channel separation, grayscale conversion, negative effects, random color filters, etc.) for specifically evaluating the color robustness of the model.

[0058] The third step involves inputting the image to be recognized and the preset recognition instructions into the updated image recognition model to obtain anti-interference color recognition information corresponding to the preset recognition instructions. In practice, the executing entity can input the image to be recognized and the preset recognition instructions into the updated image recognition model to obtain anti-interference color recognition information corresponding to the preset recognition instructions. The preset recognition instructions can be natural language instructions used to guide the image recognition task. The anti-interference color recognition information can be the color recognition result generated by the updated image recognition model after overcoming the influence of color interference in the image to be recognized. For example, the preset recognition instructions can be used to identify the color of an object in the image.

[0059] In addressing the technical problems mentioned above, the application scenario of hallucination detection in the automated handling of hazardous materials in complex industrial settings often presents the following challenges: Large models are prone to severe modal hallucinations when performing color recognition in complex industrial environments (e.g., under conditions of reflection, oil contamination, color similarity interference, partial label obstruction, etc.). This can lead to misidentification of the orange label on a common chemical raw material drum as the specific yellow of a highly toxic or explosive substance, resulting in incorrect identification conclusions. Due to these hallucination misjudgments (e.g., misinterpreting "harmless" as "highly toxic"), incorrect "isolation procedures" can be triggered (e.g., using the wrong robotic arm gripper or feeding the wrong reactor), leading to incorrect chemical mixing and potentially causing immediate leaks, explosions, or the generation of toxic gases and other catastrophic consequences. The high risk of misidentification results in low safety of hallucination detection results in real-world applications. Therefore, this application scenario requires the following characteristics: improved model hallucination immunity and enhanced safety of hallucination detection results in real-world applications. To address these technical problems, we have decided to adopt the following solution: In some optional implementations of certain embodiments, the aforementioned execution entity may perform the following steps: The first step is to obtain the second distracting answer text. In practice, the aforementioned executing entity can obtain the second distracting question text pre-stored in a storage device. This second distracting answer text can be English text unrelated to color. The storage device can be a hard drive, solid-state drive, or similar device connected to the executing entity. For example, the second distracting answer text could be "Sunday".

[0060] The second step involves randomly selecting color text information from a preset color text information database and identifying the selected color text information as the target color text information. It should be noted that the method used for randomly selecting color text information from the preset color text information database and identifying the selected color text information as the target color text information is the same as the method used for randomly selecting color text information from the preset color text information database. The target color text information can be randomly selected English words representing colors. For example, the target color text information could be "red".

[0061] The third step is to determine the color information from the preset color information library corresponding to the target color text information as the second color information. This color information can be a code representing a color. The second color information can be #FF0000 (i.e., red).

[0062] Fourth, based on the aforementioned second color information, perform text rendering processing on the aforementioned second interference answer text to obtain the second rendered answer text. It should be noted that the method used to perform text rendering processing on the aforementioned second interference answer text based on the aforementioned second color information to obtain the second rendered answer text is the same as the method used to perform color rendering interference processing on the aforementioned first interference answer text based on the aforementioned interference color information to obtain the first color interference answer text. The aforementioned second rendered answer text can be a non-black second interference answer text rendered with the color represented by the aforementioned second color information. For example, the aforementioned second rendered answer text can be a red second interference answer text.

[0063] The fifth step involves randomly masking the text containing the first hallucination interference problem to obtain the first masked interference problem text. In practice, firstly, the executing entity can use the LEN function to count the total number of characters in the first hallucination interference problem text. Then, half of the total number of characters is determined as the number of random replacement characters. Next, based on the total number of characters, the range function is used to generate a list of position indices. Then, based on the position index list and the number of random replacement characters, the random.sample() function is used to generate a list of random position indices. Afterward, the list function is used to convert the text containing the first hallucination interference problem into a list of first hallucination interference problems. Then, a for loop is used to iterate through the position index list, and for each position index in the list, the following steps are performed: first hallucination interference problem list[i] = replacement character. Finally, for each character in the first hallucination interference problem list after obtaining the replacement characters, the join method is used to combine all character elements in the list to obtain the first masked interference problem text. The first masked interference problem text can be obtained by replacing half of the randomly selected characters in the first hallucination interference problem text with specific symbols (such as...). The new text obtained after () is shown. The total number of characters mentioned above can represent the total number of characters in the first hallucination interference problem text. The position index list mentioned above can be a continuous, 0-incrementing integer sequence that uniquely and precisely corresponds to the position of each character in the first hallucination interference problem text. The random position index list mentioned above can be a specified number of position index subsets randomly and non-repeatingly selected from the complete position index list mentioned above. Each position index in the random position index list mentioned above can indicate which specific positions in the first hallucination interference problem text will be replaced. The 'i' mentioned above can represent the position index. The replacement character mentioned above can be a single symbol used to replace the original character at a specific position in the first hallucination interference problem text. For example, the replacement character mentioned above can be " " or "-". As an example, the first illusion interference question text above could be "what color is the text?". The above position index list could be [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. The above first illusion interference question list could be ['w', 'h', 'a', 't', ' ', 'c', 'o', 'l', 'o', 'r', ' ', 'i', 's', ' ', 't', 'h', 'e', ​​' ', 't', 'e', ​​'x', 't', '?']. The above random position index list could be [3,18,7,12,0,15,21,9,5,13,1]. The above first illusion interference question list after character replacement could be [' ',' ','a',' ','',' ','o',' ','o',' ',' ','i',' ',' ','t',' ','e',' ',' ','e','x',' The text regarding the first masking interference issue mentioned above can be... a o o i t e ex ?

[0064] Step 6: Based on the aforementioned first masked interference question text and the aforementioned second rendered answer text, generate a second illusion trigger image. In practice, the executing entity can use the ImageDraw.text() method to generate the second illusion trigger image based on the aforementioned first masked interference question text and the aforementioned second rendered answer text. The second illusion trigger image can be an image that presents both the "first masked interference question text" and the "second rendered answer text," capable of interfering with the large model.

[0065] Step 7: Based on the aforementioned second illusion trigger image, perform illusion detection processing on the preset image recognition large model used for identifying industrial objects to obtain second illusion detection information. In practice, the aforementioned executing entity can input the aforementioned second illusion trigger image into the preset image recognition large model used for identifying industrial objects for illusion detection processing to obtain second illusion trigger image recognition information. Then, in response to determining that the aforementioned interference color information and the aforementioned second illusion trigger image recognition information are the same, the preset correct recognition information is determined as the second illusion detection information. Then, in response to determining that the aforementioned interference color information and the aforementioned second illusion trigger image recognition information are different, the preset incorrect recognition information is determined as the second illusion detection information. The second illusion detection information can be a score representing the authenticity of the color recognition result output by the model. The aforementioned second illusion trigger image recognition information can be the color recognition information of the aforementioned second illusion trigger image output by the aforementioned preset image recognition large model.

[0066] Step 8: Based on the aforementioned second illusion detection information, first answer illusion detection information, and first text illusion detection information, a second illusion detection score is generated. In practice, firstly, the executing entity, in response to determining that the first answer illusion detection information is correctly identified, determines a preset first value as the first question illusion detection score. Then, in response to determining that the first answer illusion detection information is incorrectly identified, a preset second value is determined as the first question illusion detection score. Next, in response to determining that the first text illusion detection information is correctly identified, a preset first value is determined as the first text illusion detection score. Then, in response to determining that the first text illusion detection information is incorrectly identified, a preset second value is determined as the first text illusion detection score. Next, in response to determining that the second illusion detection information is correctly identified, a preset first value is determined as the second illusion detection score. Then, in response to determining that the second illusion detection information is incorrectly identified, a preset second value is determined as the second illusion detection score. Finally, the product of a preset third weight and the first question illusion detection score is determined as the third score. Next, the product of the preset fourth weight and the first text illusion detection score is determined as the fourth score. Then, the product of the second illusion detection information and the preset fifth weight is determined as the fifth score. Finally, the sum of the third score, the fourth score, and the fifth score is determined as the second illusion detection score. The second illusion detection score represents the overall risk level of the preset image recognition model generating illusions in the input image; a higher score indicates a greater likelihood that the model output contains illusions (i.e., factual errors, fabrications, or inconsistencies). The preset third weight represents the importance of the first answer illusion detection information in generating the second illusion detection score. The preset fourth weight represents the importance of the first text illusion detection information in generating the second illusion detection score. The preset fifth weight represents the importance of the second illusion detection information in generating the second illusion detection score. For example, the preset third weight can be 0.2. The preset fourth weight can be 0.3. The preset fifth weight can be 0.5.

[0067] Step 9: Determine the aforementioned second hallucination detection score information as the hallucination detection score information.

[0068] Step 10: Based on the aforementioned hallucination detection scoring information, generate a relay device control command. In practice, the executing entity may, in response to determining that the score represented by the aforementioned hallucination detection scoring information is less than a preset safety threshold, determine a preset disconnection command as a relay device control command. The aforementioned relay device control command may be a command capable of controlling a chemical safety isolation relay device associated with the aforementioned preset image recognition model. The aforementioned preset disconnection command may be to turn off the power and disconnect the connection. The aforementioned preset safety threshold may be a boundary value used to trigger the generation of the relay device control command based on the automated vehicle device control command. For example, the aforementioned preset safety threshold may be 90.

[0069] Step 11: The control command for the aforementioned relay device is sent to the chemical safety isolation relay device associated with the aforementioned preset image recognition model, so that the chemical safety isolation relay device can perform the disconnection operation corresponding to the control command. In practice, the executing entity can send the control command for the aforementioned relay device to the chemical safety isolation relay device associated with the aforementioned preset image recognition model, so that the chemical safety isolation relay device can perform the disconnection operation corresponding to the control command. The aforementioned chemical safety isolation relay device can be a remotely controllable power switch designed for chemical production environments, capable of cutting off the power, signal, or material supply path to a designated equipment or area. The disconnection operation can cause the controlled equipment to stop working due to a power outage.

[0070] The above-described technical solution and its related content, as an inventive point of this disclosure, solve the technical problem of "low security of hallucination detection results in real-world applications." Factors contributing to the low security of hallucination detection results from large image recognition models in real-world applications often include: Large models are prone to severe modal hallucinations when performing color recognition in complex industrial environments (such as under conditions of reflection, oil stains, color similarity interference, and partial label obstruction). They may misidentify the orange label of a common chemical raw material drum as the specific yellow of a highly toxic or explosive substance symbol, resulting in incorrect identification conclusions. Due to hallucination misjudgments (such as mistaking "harmless" for "highly toxic"), incorrect "isolation procedures" may be triggered (e.g., calling the wrong robotic arm gripper or sending it into the wrong reactor), leading to incorrect mixing of chemicals, potentially causing immediate leaks, explosions, or the generation of toxic gases and other catastrophic consequences. The high risk of misidentification results in low security of hallucination detection results in real-world applications. The following requirements are needed for this application scenario: improving the security of hallucination detection results in real-world applications within industrial automated hazardous materials handling processes. Solving the above factors can improve the security of hallucination detection results in real-world applications. To achieve this, firstly, a second interference answer text is obtained. This provides a textual basis for constructing more targeted hallucination detection samples. Then, color text information is randomly extracted from a pre-defined color text information library, and this extracted color text information is designated as the target color text information. This simulates various colors that target objects (such as hazardous material labels) might present in industrial settings, especially those colors easily confused with hazardous colors (such as highly toxic yellow) (such as orange), thus covering key misidentification boundary situations. Next, the color information in the pre-defined color information library corresponding to the target color text information is designated as the second color information. This maps the abstract semantic description (target color text information) to a specific visual color. Then, based on the second color information, the second interference answer text is rendered to obtain the second rendered answer text. This yields the second rendered answer text that introduces color interference. Finally, the first hallucination interference problem text is randomly masked to obtain the first masked interference problem text. This increases the ambiguity and difficulty of understanding of the first illusion interference question text, simulating situations of incomplete information (such as damaged or partially obscured labels) in industrial scenarios. Then, based on the aforementioned first mask interference question text and the aforementioned second rendered answer text, a second illusion trigger image is generated. Thus, by combining the "incomplete / ambiguous first mask interference question text" and the "second rendered answer text with a specific color," an image can be generated to induce color referencing illusions, simulating complex situations in industrial scenarios that lead to model errors.Subsequently, based on the aforementioned second hallucination trigger image, a large-scale image recognition model used for identifying industrial objects is subjected to hallucination detection processing to obtain second hallucination detection information. Thus, the second hallucination trigger image simulating a high-risk misjudgment scenario can be input into the large-scale image recognition model used for identifying industrial objects to observe whether the model produces the expected color semantic confusion under specific conditions, thereby quantifying the model's hallucination tendency in such scenarios. Then, based on the aforementioned second hallucination detection information, the aforementioned first answer hallucination detection information, and the aforementioned first text hallucination detection information, second hallucination detection score information is generated. This allows for a comprehensive and reliable hallucination comprehensive score by integrating the model's performance under various hallucination-induced tests (including the current color confusion test, other question interference tests, and text interference tests). Next, the aforementioned second hallucination detection score information is determined as the hallucination detection score information. Finally, based on the aforementioned hallucination detection score information, relay equipment control commands are generated. Therefore, the hallucination detection and evaluation results of the model can be transformed into actual instructions for the safety control of chemical safety isolation relay devices. For example, when the score is substandard (i.e., the risk of model hallucination is too high), a "cut off power" instruction is generated. Finally, the aforementioned relay device control instruction is sent to the chemical safety isolation relay device associated with the aforementioned preset image recognition large model, so that the chemical safety isolation relay device can execute the disconnection operation corresponding to the relay device control instruction. Thus, when the model is determined to be unsafe, the connection between the model and the execution unit of the control instruction (such as a robotic arm or conveyor belt) can be directly cut off by the relay, preventing erroneous operations that may be caused by model hallucinations, thereby implementing a forced interruption before a safety accident occurs. Because it uses test samples built for specific high-risk misjudgment scenarios (color confusion) and converts hallucination detection scores into control commands, when the hallucination detection score information is low (i.e. the model has a high risk of misjudgment), a physical isolation mechanism is automatically triggered to cut off the connection between the model and the execution unit of the control command (such as a robotic arm or conveyor belt), preventing erroneous operations that may be caused by model hallucinations, thereby improving the safety of hallucination detection results in real-world applications.

[0071] In addressing the technical problems mentioned above, the application scenario of hallucination detection in autonomous vehicles' recognition of traffic environments (such as traffic lights and taillights) often presents the following challenges: Large models are prone to severe modal hallucinations when performing color recognition under complex lighting conditions (such as backlighting, nighttime, and strong light reflection), adverse weather, or partial target occlusion. Since real-time vehicle behavior decisions are based on hallucination detection results, severe modal hallucinations can lead to incorrect traffic light recognition results, resulting in low reliability of the hallucination detection results of the large image recognition model associated with the autonomous vehicle and low safety in vehicle control applications. Therefore, this application scenario requires improving the reliability of hallucination detection results of the large image recognition model associated with the autonomous vehicle and enhancing the safety of vehicle control applications. To address these technical problems, we have decided to adopt the following solution: In some optional implementations of certain embodiments, the aforementioned execution entity may perform the following steps: The first step is to obtain the third-party directional interference question text and the third-party interference answer text. In practice, the aforementioned executing entity can obtain the third-party directional interference question text and the third-party interference answer text pre-stored in a storage device. The third-party directional interference question text can be text with directional referential meaning used to trigger the model's illusion. The third-party interference answer text can be text with a "negative meaning" and color keywords that can trigger the model's illusion. For example, the third-party directional interference question text could be "what color is below?", and the third-party interference answer text could be "Not a green light".

[0072] The second step involves randomly masking the third interference answer text to obtain the third masked interference answer text. It should be noted that the method used to randomly mask the third interference answer text is the same as the method used to randomly mask the first illusion interference question text to obtain the first masked interference question text. Specifically, the third masked interference answer text can be obtained by replacing half of the randomly selected characters in the third interference answer text with specific symbols (such as...). The new text obtained after that.

[0073] The third step involves extracting at least one color text message from a preset color text information library and identifying the extracted at least one color text message as at least one third target color text message. It should be noted that the method used to extract at least one color text message from the preset color text information library and to identify the extracted at least one color text message as at least one third target color text message is the same as the method used to randomly extract color text messages from the preset color text information library. Each color text message in the extracted at least one color text message can be a randomly selected English word representing a color.

[0074] The fourth step is to determine the color information in the preset color information library corresponding to each of the at least one third target color text information as the third color information.

[0075] Fifth, based on the obtained at least one third color information, perform multi-color rendering processing on the aforementioned third mask interference answer text to obtain third multi-color answer text. It should be noted that the method used to perform multi-color rendering processing on the aforementioned third mask interference answer text based on the obtained at least one third color information to obtain third multi-color answer text is the same as the method used to perform color rendering interference processing on the aforementioned first interference answer text based on the aforementioned interference color information to obtain first-color interference answer text. The aforementioned third multi-color answer text can be a third mask interference answer text with multiple colors after rendering at least one color represented by the aforementioned at least one third color information.

[0076] Step 6: Based on the aforementioned third-party directional interference question text and the aforementioned third-party multi-color answer text, generate the third illusion trigger image. It should be noted that the method used to generate the third illusion trigger image based on the aforementioned third-party directional interference question text and the aforementioned third-party multi-color answer text is the same as the method used to generate the second illusion trigger image based on the aforementioned first mask interference question text and the aforementioned second rendered answer text. The aforementioned third illusion trigger image can be an image that presents both the "third-party directional interference question text" and the "third-party multi-color answer text," capable of causing color recognition and directional interference to the large model.

[0077] Step 7: Based on the aforementioned third illusion trigger image, perform multi-color illusion detection processing on the preset image recognition large model to obtain third illusion detection information. It should be noted that the method used to perform multi-color illusion detection processing on the preset image recognition large model based on the aforementioned third illusion trigger image to obtain third illusion detection information is the same as the method used to perform illusion detection processing on the preset image recognition large model used for industrial object recognition based on the aforementioned second illusion trigger image to obtain second illusion detection information. The aforementioned third illusion detection information can be a score representing the authenticity of the color recognition results output by the model.

[0078] Step 8: Based on the aforementioned third illusion detection information, second illusion detection information, first answer illusion detection information, and first text illusion detection information, a third illusion detection score is generated. In practice, firstly, the executing entity may, in response to determining that the first answer illusion detection information is correctly identified, set a preset first value as the first question illusion detection score. Then, in response to determining that the first answer illusion detection information is incorrectly identified, set a preset second value as the first question illusion detection score. Next, in response to determining that the first text illusion detection information is correctly identified, set a preset first value as the first text illusion detection score. Then, in response to determining that the first text illusion detection information is incorrectly identified, set a preset second value as the first text illusion detection score. Next, in response to determining that the second illusion detection information is correctly identified, set a preset first value as the second illusion detection score. Then, in response to determining that the second illusion detection information is incorrectly identified, set a preset second value as the second illusion detection score. Next, in response to determining that the aforementioned third illusion detection information is correctly identified, a preset first value is determined as the third illusion detection score. Then, in response to determining that the aforementioned third illusion detection information is incorrectly identified, a preset second value is determined as the third illusion detection score. Afterwards, the product of a preset sixth weight and the aforementioned first problem illusion detection score is determined as the sixth score. Then, the product of a preset seventh weight and the aforementioned first text illusion detection score is determined as the seventh score. Afterwards, the product of a preset eighth weight and the aforementioned second illusion detection score is determined as the eighth score. Next, the product of the aforementioned third illusion detection score and a preset ninth weight is determined as the ninth score. Finally, the sum of the aforementioned sixth score, seventh score, eighth score, and ninth score is determined as the third illusion detection score information. The aforementioned third illusion detection score information can represent the overall risk level of the aforementioned preset image recognition model generating illusions in the input image; the higher the score, the greater the possibility that the model output contains illusions (i.e., factual errors, fabrications, or inconsistencies). The aforementioned preset sixth weight can represent the importance of the first answer illusion detection information in generating the third illusion detection score. The aforementioned preset seventh weight can represent the importance of the first text illusion detection information in generating the third illusion detection score. The aforementioned preset eighth weight can represent the importance of the second illusion detection information in generating the third illusion detection score. The aforementioned preset ninth weight can represent the importance of the third illusion detection information in generating the third illusion detection score. For example, the aforementioned preset sixth weight can be 0.1. The aforementioned preset seventh weight can be 0.2. The aforementioned preset eighth weight can be 0.3. The aforementioned preset ninth weight can be 0.4.

[0079] The ninth step is to determine the aforementioned third hallucination detection score information as the hallucination detection score information.

[0080] Step 10: Based on the aforementioned hallucination detection scoring information, generate control instructions for the autonomous driving device. In practice, the executing entity may, in response to determining that the score represented by the aforementioned hallucination detection scoring information is less than a preset second safety threshold, determine the preset deceleration instruction as a relay device control instruction. The aforementioned autonomous driving device control instruction can be an instruction capable of controlling an autonomous driving device associated with the aforementioned preset image recognition large model. The aforementioned preset deceleration instruction can be a boundary value that triggers the generation of the autonomous driving device control instruction. For example, the aforementioned preset second safety threshold can be 95. The aforementioned preset deceleration instruction can be to perform deceleration and maintain a safe distance.

[0081] The eleventh step involves sending the aforementioned autonomous driving device control commands to the autonomous driving device associated with the aforementioned preset image recognition model, so that the autonomous driving device can execute the operations corresponding to the aforementioned autonomous driving device control commands. In practice, the executing entity can send the aforementioned autonomous driving device control commands to the autonomous driving device associated with the aforementioned preset image recognition model, so that the autonomous driving device can execute the operations corresponding to the aforementioned autonomous driving device control commands. The aforementioned autonomous driving device can be an intelligent vehicle associated with the aforementioned preset image recognition model, equipped with an autonomous driving system, and capable of receiving and executing external control commands. The aforementioned operation can be slowing down the driving speed.

[0082] The above-described technical solution and its related content, as an inventive point of this disclosure, solve the technical problem of "low reliability of the illusion detection results of large-scale image recognition models associated with autonomous vehicles and low security in vehicle control applications." Factors leading to low reliability and security in vehicle control applications of large-scale image recognition models associated with autonomous vehicles often include the following: When performing color recognition under complex lighting conditions (such as backlighting, nighttime, strong light reflection), adverse weather conditions, or partial target occlusion, the large model is prone to severe modal illusions regarding key color information (such as red / green lights, brake lights). Since real-time decisions on vehicle behavior need to be made based on the illusion detection results, severe modal illusions may cause the model to output incorrect traffic light recognition results, resulting in low reliability and security in vehicle control applications of large-scale image recognition models associated with autonomous vehicles. Solving these factors can improve the reliability and security of illusion detection results in vehicle control applications of large-scale image recognition models associated with autonomous vehicles. To achieve this effect, firstly, third-party interference question text and third-party interference answer text are obtained. Then, the aforementioned third interference answer text is randomly masked to obtain a third masked interference answer text. This simulates scenarios where information is incomplete or partially obscured in reality (such as worn or soiled text), increasing the difficulty for the model to understand the problem and reason based on visual features. Next, at least one color text information is extracted from a preset color text information library, and this extracted at least one color text information is identified as at least one third target color text information. This simulates multiple color combinations that may occur simultaneously in a traffic environment (e.g., red, yellow, green, blue, etc.), especially those colors commonly found in traffic lights, signs, and vehicle taillights, which are crucial for vehicle control decisions, thus constructing a scenario covering multi-color interactions and confusion. Then, the color information in the preset color information library corresponding to each of the aforementioned at least one third target color text information is identified as the third color information. This transforms abstract semantics into concrete visual colors. Subsequently, based on the obtained at least one third color information, the aforementioned third masked interference answer text undergoes multi-color rendering processing to obtain a third multi-color answer text. This generates a third multi-color answer text that visually contains multiple key colors. Then, based on the aforementioned third-direction interference question text and the aforementioned third-multicolor answer text, a third hallucination triggering image is generated. Thus, the third-direction interference question text with directional reference and the third-multicolor answer text containing multicolor interference can be combined to generate an image that can simulate hallucinations that may be induced under complex road conditions for autonomous driving (such as intersections with multiple traffic lights and complex lighting environments). Subsequently, based on the aforementioned third hallucination triggering image, a preset image recognition model is subjected to multicolor hallucination detection processing to obtain third hallucination detection information.Therefore, the model's resistance to hallucinations caused by multi-color and directional semantic interference can be evaluated in key autonomous driving scenarios (intersection signal understanding). Then, based on the aforementioned third hallucination detection information, second hallucination detection information, first answer hallucination detection information, and first text hallucination detection information, a third hallucination detection score is generated. This allows for a comprehensive reliability score that covers a wider range and more closely reflects the complexities of real-world driving environments, by integrating the model's performance in various, multi-dimensional hallucination-induced tests (including the current complex direction-color test, color confusion test, question interference test, and text interference test). Next, the aforementioned third hallucination detection score is designated as the hallucination detection score. Then, based on the aforementioned hallucination detection score, control commands for the autonomous driving device are generated. This allows the hallucination detection evaluation results of the model to be transformed into actual commands that can safely control the autonomous driving device; for example, when the score is unsatisfactory (i.e., the model's hallucination risk is too high), a command to "slow down and maintain a safe distance" is generated. Finally, the aforementioned autonomous driving control commands are sent to the autonomous driving device associated with the aforementioned preset image recognition model, allowing the autonomous driving device to execute operations corresponding to the aforementioned control commands. This allows for the constraint of actual vehicle behavior even when the model is deemed unsafe, reducing the possibility of traffic accidents directly caused by model hallucinations. Furthermore, by using simulated autonomous driving scenarios (traffic lights, taillight color judgment) to construct hallucination test samples for hallucination detection, and converting the hallucination detection scores into autonomous driving control commands, when the hallucination detection score is low (i.e., the model has a high risk of misjudgment), a command is sent to control the vehicle to decelerate. This prevents the model from making dangerous decisions on actual roads based on unreliable hallucination detection results, thus improving the reliability of the hallucination detection results of the image recognition model associated with the autonomous vehicle and the safety of its vehicle control applications.

[0083] The above-described embodiments of this disclosure have the following beneficial effects: the multimodal large model illusion attack method based on textual reference interference in some embodiments of this disclosure improves the comprehensiveness and reliability of multimodal large model illusion attacks based on textual reference interference. Specifically, the reason why the results of multimodal large model illusion attacks based on textual reference interference have low comprehensiveness and low reliability in real applications is that: when performing illusion detection by thresholding the confidence score output by the image recognition large model or by only using a standard test set for accuracy evaluation, the images in the standard test set usually cannot cause modal conflict, and the illusion detection result output by the model may be normal. However, when performing illusion detection on images with special textual references, color recognition under modal conflict is easily interfered with by both textual bias and dynamic decision imbalance. Image recognition large models generally have the phenomenon of "blindly believing the text in the image." Special textual references in the image can interfere with the image recognition large model's ability to recognize and perceive colors, causing it to produce illusions, and the illusion detection result may be abnormal. The accuracy evaluation of hallucination detection using a standard test set has flaws, resulting in a one-sided hallucination detection approach. This leads to low comprehensiveness and reliability of hallucination detection results in large-scale image recognition models in real-world applications. Therefore, this disclosure presents a multimodal large-scale hallucination attack method based on textual referential interference. First, a first hallucination interference question text is obtained. Then, based on a preset color text information library, a first interference answer text is generated. This yields a first interference answer text whose meaning represents a certain color. Next, based on the preset color information library, the first interference answer text undergoes color rendering interference processing to obtain a first color interference answer text. This yields a first color interference answer text rendered with a color different from its literal meaning, inducing semantic referential hallucinations in the model and testing whether the model is misled by the literal meaning of the text. Then, based on the first hallucination interference question text and the first color interference answer text, a first hallucination interference text is generated. This yields a first hallucination interference text used to generate a first answer hallucination trigger image. Finally, the first hallucination interference text undergoes hallucination trigger image generation processing to obtain the first answer hallucination trigger image. Thus, a first answer illusion trigger image can be obtained to induce semantic referential illusion in the model and test whether the model is misled by the literal meaning of the text. Next, the first answer illusion trigger image is subjected to dual color interference processing to obtain a first illusion trigger image. This yields a first illusion trigger image with dual color interference to test whether the model can distinguish between the "color of the question itself" and the "color of the target the question points to." Subsequently, based on the first answer illusion trigger image and the first illusion trigger image, a pre-defined image recognition large-scale model is subjected to illusion detection processing to obtain first answer illusion detection information and first text illusion detection information.Therefore, the model's color recognition and perception capabilities can be scored based on the first answer hallucination detection information and the first text hallucination detection information. Finally, based on the aforementioned first answer hallucination detection information and the aforementioned first text hallucination detection information, hallucination detection score information is generated. Thus, hallucination detection score information characterizing the model's color recognition and perception capabilities on hallucinogenic images can be obtained. Because images capable of inducing semantic referential hallucinations are used for hallucination detection, and dual color interference is introduced, the model's ability to distinguish between the "color of the question itself" and the "color of the target the question points to" is tested. Using different types of images capable of inducing hallucinations expands the detection coverage, making hallucination detection more comprehensive. This avoids the potential masking issues that can arise from using only a standard test set that cannot induce modal conflict for hallucination detection, and more effectively exposes errors that the model may make in practical applications. Furthermore, this improves the comprehensiveness and reliability of multimodal large-model hallucination attacks based on text referential interference.

[0084] Further reference Figure 2 As an implementation of the methods shown in the figures, this disclosure provides some embodiments of a multimodal large-model illusion attack device based on textual referential interference. These device embodiments are similar to... Figure 1 Corresponding to the method embodiments shown, the device can be specifically applied to various electronic devices.

[0085] like Figure 2As shown, a multimodal large-scale illusion attack device 200 based on textual referential interference in some embodiments includes: an acquisition unit 201, a first generation unit 202, a color rendering interference processing unit 203, a second generation unit 204, an illusion trigger image generation processing unit 205, a dual-color interference processing unit 206, an illusion detection processing unit 207, and a third generation unit 208. The acquisition unit 201 is configured to acquire a first illusion interference question text; the first generation unit 202 is configured to generate a first interference answer text based on a preset color text information library; the color rendering interference processing unit 203 is configured to perform color rendering interference processing on the first interference answer text based on a preset color information library to obtain a first color interference answer text; the second generation unit 204 is configured to generate a first illusion interference text based on the first illusion interference question text and the first color interference answer text; and the illusion trigger image generation processing unit 205 is configured to perform illusion processing on the first illusion interference text. The first answer hallucination trigger image generation process is used to obtain the first answer hallucination trigger image; the dual color interference processing unit 206 is configured to perform dual color interference processing on the first answer hallucination trigger image to obtain the first hallucination trigger image; the hallucination detection processing unit 207 is configured to perform hallucination detection processing on a preset image recognition large model based on the first answer hallucination trigger image and the first hallucination trigger image to obtain the first answer hallucination detection information and the first text hallucination detection information; the third generation unit 208 is configured to generate hallucination detection scoring information based on the first answer hallucination detection information and the first text hallucination detection information.

[0086] It is understandable that the units described in the device 200 are related to the reference. Figure 1 The steps in the method described above correspond to each other. Therefore, the operations, features, and beneficial effects described above for the method also apply to the device 200 and the units contained therein, and will not be repeated here.

[0087] The following is for reference. Figure 3 It shows a schematic diagram of the structure of an electronic device 300 suitable for implementing some embodiments of the present disclosure. Figure 3 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this disclosure.

[0088] like Figure 3As shown, the electronic device 300 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 301, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 302 or a program loaded from a storage device 308 into a random access memory (RAM) 303. The RAM 303 also stores various programs and data required for the operation of the electronic device 300. The processing unit 301, ROM 302, and RAM 303 are interconnected via a bus 304. An input / output (I / O) interface 305 is also connected to the bus 304.

[0089] Typically, the following devices can be connected to I / O interface 305: input devices 306 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 307 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 308 including, for example, magnetic tapes, hard disks, etc.; and communication devices 309. Communication device 309 allows electronic device 300 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 3 An electronic device 300 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively. Figure 3 Each box shown can represent a device or multiple devices as needed.

[0090] In particular, according to some embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 309, or installed from storage device 308, or installed from ROM 302. When the computer program is executed by processing device 301, it performs the functions defined in the methods of some embodiments of this disclosure.

[0091] It should be noted that, in some embodiments of this disclosure, the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0092] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.

[0093] A computer-readable medium may be contained within an electronic device or may exist independently, not assembled into the electronic device. The computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: acquire a first hallucination interference question text; generate a first interference answer text based on a preset color text information library; perform color rendering interference processing on the first interference answer text based on the preset color information library to obtain a first color interference answer text; generate a first hallucination interference text based on the first hallucination interference question text and the first color interference answer text; perform hallucination trigger image generation processing on the first hallucination interference text to obtain a first answer hallucination trigger image; perform dual color interference processing on the first answer hallucination trigger image to obtain a first hallucination trigger image; perform hallucination detection processing on a preset image recognition large model based on the first answer hallucination trigger image and the first hallucination trigger image to obtain first answer hallucination detection information and first text hallucination detection information; and generate hallucination detection scoring information based on the first answer hallucination detection information and the first text hallucination detection information.

[0094] Computer program code for performing operations of some embodiments of this disclosure can be written in one or more programming languages ​​or a combination thereof. Programming languages ​​include object-oriented programming languages—such as Java, Smalltalk, and C++—and conventional procedural programming languages—such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0095] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0096] The units described in some embodiments of this disclosure can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor may be described as including an acquisition unit, a first generation unit, a color rendering interference processing unit, a second generation unit, a hallucination trigger image generation processing unit, a dual color interference processing unit, a hallucination detection processing unit, and a third generation unit. The names of these units do not necessarily limit the unit itself; for example, the first generation unit may also be described as "a unit that generates first interference answer text based on a preset color text information database."

[0097] The functions described above in this document can be performed at least in part by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.

[0098] The above description is merely a selection of preferred embodiments of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this disclosure is not limited to technical solutions formed by specific combinations of technical features, but should also cover other technical solutions formed by arbitrary combinations of technical features or their equivalents without departing from the inventive concept. For example, technical solutions formed by substituting features with (but not limited to) technical features with similar functions disclosed in the embodiments of this disclosure.

Claims

1. A multimodal large-model illusion attack method based on textual referential interference, comprising: Obtain the text of the first hallucination interference problem; Based on a preset color text information database, generate the first distracting answer text; Based on a preset color information library, the first interference answer text is subjected to color rendering interference processing to obtain the first color interference answer text; Based on the first illusion interference question text and the first color interference answer text, generate the first illusion interference text; The first hallucination interference text is processed to generate a hallucination trigger image, resulting in the first answer hallucination trigger image. The first answer hallucination trigger image is subjected to dual color interference processing to obtain the first hallucination trigger image; Based on the first answer hallucination trigger image and the first hallucination trigger image, hallucination detection processing is performed on the preset image recognition large model to obtain the first answer hallucination detection information and the first text hallucination detection information; Based on the first answer hallucination detection information and the first text hallucination detection information, hallucination detection score information is generated.

2. The method according to claim 1, wherein, The process of generating the first distracting answer text based on a preset color text information database includes: Randomly select color text information from a preset color text information library; The extracted color text information was identified as the first distractor answer text.

3. The method according to claim 2, wherein, Each color text information in the preset color text information database corresponds to each color information in the preset color information database, and the step of performing color rendering interference processing on the first interference answer text based on the preset color information database to obtain the first color interference answer text includes: The color information corresponding to the extracted color text information is determined as the target color information; Based on the target color information, the preset color information library is updated to obtain an updated color information library; Randomly select color information from the updated color information database; The extracted color information is identified as interfering color information; Based on the interference color information, the first interference answer text is subjected to color rendering interference processing to obtain the first color interference answer text.

4. The method according to claim 1, wherein, The process of generating a hallucination trigger image from the first hallucination interference text to obtain the first answer hallucination trigger image includes: The first hallucination interference text is subjected to random and diverse layout processing to obtain random hallucination interference text; The random hallucination interference text is subjected to coordinate mapping processing to obtain a mapped hallucination interference image; The mapped illusion interference image is subjected to font edge smoothing processing to obtain a smoothed illusion interference image; The smoothed illusion interference image is identified as the first answer illusion trigger image.

5. The method according to claim 3, wherein, The process of performing hallucination detection processing on a preset image recognition large model based on the first answer hallucination trigger image and the first text hallucination trigger image to obtain first answer hallucination detection information and first text hallucination detection information includes: The first answer hallucination trigger image and the first hallucination trigger image are input into a preset image recognition large model to obtain answer image recognition information and image recognition information; Based on the interference color information, the following steps are performed on the answer image recognition information: In response to determining that the interference color information and the answer image recognition information are the same, the preset correct recognition information is determined as the first answer illusion detection information; In response to determining that the interference color information and the answer image recognition information are different, the preset recognition error information is determined as the first answer hallucination detection information; Based on the interference color information, the following steps are performed on the image recognition information: In response to determining that the interference color information and the image recognition information are the same, the preset correctly recognized information is determined as the first text illusion detection information; In response to the determination that the interference color information and the image recognition information are different, the preset recognition error information is determined as the first text illusion detection information.

6. The method according to claim 3, wherein, The process of performing dual-color interference processing on the first answer hallucination trigger image to obtain the first hallucination trigger image includes: Based on the interference color information, the updated color information library is processed to obtain the updated color information library; Randomly extract color information from the updated color information database; The extracted color information is identified as the second interfering color information; Based on the second interference color information, the first answer hallucination trigger image is subjected to dual color interference processing to obtain the first hallucination trigger image.

7. The method according to claim 1, wherein, The method further includes: Based on the hallucination detection scoring information, the preset image recognition model is updated to obtain the updated image recognition model. Acquire the image to be recognized; The image to be recognized and the preset recognition command are input into the updated image recognition large model to obtain anti-interference color recognition information corresponding to the preset recognition command.

8. A multimodal large-model illusion attack device based on textual referential interference, comprising: The acquisition unit is configured to acquire the text of the first hallucination interference problem. The first generation unit is configured to generate the first distracting answer text based on a preset color text information library; The color rendering interference processing unit is configured to perform color rendering interference processing on the first interference answer text based on a preset color information library to obtain the first color interference answer text. The second generation unit is configured to generate the first illusion interference text based on the first illusion interference question text and the first color interference answer text; The hallucination triggering image generation processing unit is configured to perform hallucination triggering image generation processing on the first hallucination interference text to obtain the first answer hallucination triggering image; A dual-color interference processing unit is configured to perform dual-color interference processing on the first answer hallucination trigger image to obtain the first hallucination trigger image; The hallucination detection processing unit is configured to perform hallucination detection processing on a preset image recognition large model based on the first answer hallucination trigger image and the first hallucination trigger image to obtain the first answer hallucination detection information and the first text hallucination detection information; The third generation unit is configured to generate hallucination detection score information based on the first answer hallucination detection information and the first text hallucination detection information.

9. An electronic device, comprising: One or more processors; A storage device on which one or more programs are stored; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1 to 7.

10. A computer-readable medium having a computer program stored thereon, wherein, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 7.