A multi-modal universal adversarial sample generation method based on image fusion
By using image fusion and multi-loss function optimization, a multimodal general adversarial example is generated, which solves the problem of difficulty in attacking multimodal intelligent systems in existing technologies and achieves a general attack effect on single-band and multi-band models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2024-02-29
- Publication Date
- 2026-06-30
AI Technical Summary
Existing adversarial example attack methods mainly target single models and are difficult to effectively attack multimodal and multiband intelligent systems, such as intelligent systems with multiple sensor sources. There is a lack of general adversarial example generation methods for multi-source fusion data.
Image fusion technology is used to fuse single-band images to generate multi-band information images. Random perturbations are added to the fused images, and multiple loss functions are used for optimization to achieve a general attack on both single-band and multi-band models.
It achieves a universal attack on single-band and multi-band models, and can simultaneously deceive multiple single-mode and multi-mode task models, improving the success rate and efficiency of the attack.
Smart Images

Figure CN118015417B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image processing technology, and in particular relates to a multimodal general adversarial sample generation method based on image fusion. Background Technology
[0002] With the development of artificial intelligence, the field of intelligent adversarial warfare has attracted increasing attention. Currently, research on adversarial instances is crucial, encompassing both white-box and black-box attacks, as well as attack-driven defense perspectives.
[0003] Current adversarial attack methods mainly target single models or single-source data. However, current intelligent systems, such as navigation, guidance, image recognition, and target tracking systems, are often multimodal, multi-band, or multi-source sensor intelligent systems (e.g., compatible with optical images, infrared imaging, and radar imaging). In the fields of computer science and automation, they are usually defined as "multimodal"; in remote sensing and optoelectronics, they are often called "multi-band"; and in the aerospace field, they are usually called "multi-source sensor".
[0004] It is evident that research on multi-band image data, multi-source fused data, compatibility with ultraviolet / visible / infrared / Sar, etc., physical domain adversarial implementation of multi-source / multi-band, and general adversarial sample generation methods for source images / fused images urgently need to be tackled and overcome. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention proposes a multimodal general adversarial sample generation scheme based on image fusion.
[0006] This invention focuses on general intelligent adversarial attacks on source / fused images in multi-band intelligent adversarial systems. Using multi-band fused images as the data source, it attacks both single-band and multi-band models simultaneously by adding perturbations. After adjusting the perturbations through multiple loss functions and iterating multiple times, it finally achieves a general attack on both single-band and multi-band models. This solves the problem of one-to-many general attacks that use a single adversarial example to attack several single-band identical tasker models and their corresponding multi-band identical tasker models.
[0007] The first aspect of this invention proposes a multimodal universal adversarial example generation method based on image fusion. The method includes:
[0008] Step S1: Obtain K single-modal single-channel images, and fuse the K single-modal single-channel images to obtain a multimodal single-channel fused image;
[0009] Step S2: Add random perturbation noise to the multimodal single-channel fused image to generate adversarial example images of the multimodal single-channel fused image;
[0010] Step S3: For the K single-modal single-channel images, construct K single-modal image task models, and further construct a cascaded model and a fusion model of the K single-modal image task models;
[0011] Step S4: Input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively to perform adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ;
[0012] Step S5: Utilize a gradient descent-based backpropagation algorithm to continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model until the loss function L1,...,L... is reached. K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合 It exceeds the attack success rate threshold.
[0013] According to the method of the first aspect of the present invention, the K single-modal single-channel images have different modalities, totaling K single-modalities, and the K single-modal image task models correspond to the K single-modalities.
[0014] According to the method of the first aspect of the present invention, the cascaded model is a model obtained by cascading the K single-modal image task models, and the fusion model is a model used to fuse the K single-modal models.
[0015] According to the method of the first aspect of the present invention, the K single-modal image task models, the cascaded model and the fusion model are all trained optimized models with recognition accuracy higher than the optimization threshold.
[0016] According to the method of the first aspect of the present invention, in step S5, the random perturbation noise added to the multimodal single-channel fused image is continuously updated, thereby continuously generating new adversarial example images, when the loss function L1,...,L K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合When the success rate exceeds the attack success rate threshold, the corresponding adversarial sample image is saved as the K-type single-modality multimodal general adversarial sample image.
[0017] A second aspect of this invention proposes a multimodal universal adversarial example generation system based on image fusion. The system includes:
[0018] The first processing unit is configured to: acquire K single-modal single-channel images, fuse the K single-modal single-channel images, and obtain a multimodal single-channel fused image;
[0019] The second processing unit is configured to add random perturbation noise to the multimodal single-channel fused image to generate an adversarial example image of the multimodal single-channel fused image.
[0020] The third processing unit is configured to: construct K single-modal image task models for the K single-modal single-channel images, and further construct a cascaded model and a fusion model of the K single-modal image task models;
[0021] The fourth processing unit is configured to: input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively for adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ;
[0022] The fifth processing unit is configured to: continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model using a gradient descent-based backpropagation algorithm, until the loss function L1,...,L... K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合 It exceeds the attack success rate threshold.
[0023] A third aspect of this invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a multimodal general adversarial example generation method based on image fusion, as disclosed in the first aspect of this invention.
[0024] A fourth aspect of this invention discloses a computer-readable storage medium. The computer-readable storage medium stores a computer program, which, when executed by a processor, implements a multimodal general adversarial example generation method based on image fusion, as described in the first aspect of this disclosure.
[0025] In summary, this invention utilizes image fusion technology to fuse single-band images, fusing two single-channel single-band (single-modal) images to generate a single-channel fused image containing multi-band (multi-modal) information from both. Random perturbations are then added to the fused image. By simultaneously attacking single-modal models, multi-modal composite models, and multi-modal fusion models, and through back-optimization of the loss functions of multiple models, a general adversarial capability that can simultaneously attack single-modal task models and multi-modal task models is finally achieved. Attached Figure Description
[0026] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0027] Figure 1 This is a flowchart illustrating a multimodal general adversarial example generation method based on image fusion according to an embodiment of the present invention.
[0028] Figure 2 (a)-(p) are schematic diagrams of the results of the multimodal general adversarial sample attack intelligent detector according to an embodiment of the present invention.
[0029] Figure 3 This is a structural diagram of an electronic device according to an embodiment of the present invention. Detailed Implementation
[0030] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0031] This invention focuses on general intelligent adversarial attacks on source / fused images in multi-band intelligent adversarial systems. Using multi-band fused images as the data source, it attacks both single-band and multi-band models simultaneously by adding perturbations. After adjusting the perturbations through multiple loss functions and iterating multiple times, it finally achieves a general attack on both single-band and multi-band models. This solves the problem of one-to-many general attacks that use a single adversarial example to attack several single-band identical tasker models and their corresponding multi-band identical tasker models.
[0032] The first aspect of this invention proposes a multimodal universal adversarial example generation method based on image fusion. The method includes:
[0033] Step S1: Obtain K single-modal single-channel images, and fuse the K single-modal single-channel images to obtain a multimodal single-channel fused image;
[0034] Step S2: Add random perturbation noise to the multimodal single-channel fused image to generate adversarial example images of the multimodal single-channel fused image;
[0035] Step S3: For the K single-modal single-channel images, construct K single-modal image task models, and further construct a cascaded model and a fusion model of the K single-modal image task models;
[0036] Step S4: Input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively to perform adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ;
[0037] Step S5: Utilize a gradient descent-based backpropagation algorithm to continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model until the loss function L1,...,L... is reached. K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合 It exceeds the attack success rate threshold.
[0038] According to the method of the first aspect of the present invention, the K single-modal single-channel images have different modalities, totaling K single-modalities, and the K single-modal image task models correspond to the K single-modalities.
[0039] According to the method of the first aspect of the present invention, the cascaded model is a model obtained by cascading the K single-modal image task models, and the fusion model is a model used to fuse the K single-modal models.
[0040] According to the method of the first aspect of the present invention, the K single-modal image task models, the cascaded model and the fusion model are all trained optimized models with recognition accuracy higher than the optimization threshold.
[0041] According to the method of the first aspect of the present invention, in step S5, the random perturbation noise added to the multimodal single-channel fused image is continuously updated, thereby continuously generating new adversarial example images, when the loss function L1,...,L K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合 When the success rate exceeds the attack success rate threshold, the corresponding adversarial sample image is saved as the K-type single-modality multimodal general adversarial sample image.
[0042] Specific Example 1
[0043] For a single-mode image (e.g., a visible light band image) and another single-mode image (e.g., an infrared band image), two or more single-channel single-mode image data are fused to generate single-channel image data containing multimodal information, which is denoted as multimodal fused image data.
[0044] An initial random noise is added to the multimodal fused image data to generate an initial single-channel fused image adversarial example containing multimodal information.
[0045] Using the generated single-channel fused image adversarial examples containing multimodal information, adversarial attacks were simultaneously launched against two single-modal image task models, a multimodal cascaded image task model, and a multimodal fused image task model (e.g., the YOLO v5 detection model). The attack success rates P1, P2, P3 (cascaded), and P4 (fused) were recorded.
[0046] Step S4: Using loss functions L1, L2, L3 (cascaded), and L4 (fusion) for two single-modal image task models, a multimodal cascaded image task model, and a multimodal fusion image task model, backpropagate to optimize the fusion image adversarial sample containing multimodal information in a single channel. Repeat the entire process until the attack success rate P1, P2, P3 (cascaded), and P4 (fusion) are all greater than 50%, the loss functions L1, L2, L3 (cascaded), and L4 (fusion) are not lower than the threshold, and the attack success rate tends to stabilize. This is considered an effective attack on the two single-modal image task models, the multimodal cascaded image task model, and the multimodal fusion image task model, achieving a general attack on the two single-modal image task models, the multimodal cascaded image task model, and the multimodal fusion image task model. Record and save the final obtained multimodal general adversarial sample.
[0047] Specific Example 2
[0048] Experiments were conducted in the laboratory using a scaled-down car. The car could be captured by four detectors, including visible light, near-infrared, mid-infrared, and far-infrared image sensors, representing four modalities of image imaging.
[0049] like Figure 2 As shown, (d) and (l) are visible light vehicle detection, with a detection rate of 100%; (a) and (i) are near-infrared vehicle detection, with detection rates of 98% and 97% respectively; (b) and (j) are far-infrared vehicle detection, with detection rates of 97% and 98% respectively; and (c) and (k) are near-infrared vehicle detection, with detection rates of 98% and 97% respectively.
[0050] like Figure 2 As shown, adversarial examples were added to the vehicle. (h) and (p), (e) and (m), (f) and (n), (g) and (o) represent the successful adversarial attacks against the multimodal task under visible light, near-infrared, mid-infrared, and far-infrared detection, respectively. The detection rate of each example decreased by more than 50%, and the detector did not display the detection box. This successfully deceived the multimodal detector, making it undetectable and achieving one-to-many adversarial attacks.
[0051] It is evident that the multimodal general adversarial example generation method based on image fusion is reliable in principle and can achieve a general attack on both single-channel single-modal image models and multimodal image models. It utilizes image fusion and multi-loss function optimization to solve the one-to-many general adversarial attack problem for different single-modal and multimodal image task models under a single channel. This approach, which uses image fusion to achieve multimodal (multi-band) information from different single-modal images in the fused adversarial examples, and multi-loss function optimization to achieve general adversarial attacks on single-modal and multimodal models within the same model framework, represents a novel approach to multimodal adversarial attacks in intelligent adversarial systems.
[0052] A second aspect of this invention proposes a multimodal universal adversarial example generation system based on image fusion. The system includes:
[0053] The first processing unit is configured to: acquire K single-modal single-channel images, fuse the K single-modal single-channel images, and obtain a multimodal single-channel fused image;
[0054] The second processing unit is configured to add random perturbation noise to the multimodal single-channel fused image to generate an adversarial example image of the multimodal single-channel fused image.
[0055] The third processing unit is configured to: construct K single-modal image task models for the K single-modal single-channel images, and further construct a cascaded model and a fusion model of the K single-modal image task models;
[0056] The fourth processing unit is configured to: input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively for adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ;
[0057] The fifth processing unit is configured to: continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model using a gradient descent-based backpropagation algorithm, until the loss function L1,...,L... K L 级联 and L 融合 The success rate of the attack is higher than the loss function threshold, and the success rate P1,...,P K P 级联 and P 融合 It exceeds the attack success rate threshold.
[0058] A third aspect of this invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a multimodal general adversarial example generation method based on image fusion, as disclosed in the first aspect of this invention.
[0059] Figure 3 This is a structural diagram of an electronic device according to an embodiment of the present invention, such as... Figure 3As shown, the electronic device includes a processor, memory, communication interface, display screen, and input device connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, Near Field Communication (NFC), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input device can be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the device's casing, or an external keyboard, touchpad, or mouse.
[0060] Those skilled in the art will understand that Figure 3 The structure shown is merely a structural diagram of the part related to the technical solution of this disclosure and does not constitute a limitation on the electronic device to which the solution of this application is applied. The specific electronic device may include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements.
[0061] A fourth aspect of this invention discloses a computer-readable storage medium. The computer-readable storage medium stores a computer program, which, when executed by a processor, implements a multimodal general adversarial example generation method based on image fusion, as described in the first aspect of this disclosure.
[0062] In summary, this invention utilizes image fusion technology to fuse single-band images, fusing two single-channel single-band (single-modal) images to generate a single-channel fused image containing multi-band (multi-modal) information from both. Random perturbations are then added to the fused image. By simultaneously attacking single-modal models, multi-modal composite models, and multi-modal fusion models, and through back-optimization of the loss functions of multiple models, a general adversarial capability that can simultaneously attack single-modal task models and multi-modal task models is finally achieved.
[0063] Please note that the technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments have been described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification. The above embodiments only illustrate several implementation methods of this application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be pointed out that for those skilled in the art, several modifications and improvements can be made without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. A multimodal general adversarial example generation method based on image fusion, characterized in that, The method includes: Step S1: Obtain K single-modal single-channel images, and fuse the K single-modal single-channel images to obtain a multimodal single-channel fused image; Step S2: Add random perturbation noise to the multimodal single-channel fused image to generate adversarial example images of the multimodal single-channel fused image; Step S3: For the K single-modal single-channel images, construct K single-modal image task models, and construct a cascaded model and a fusion model of the K single-modal image task models; Step S4: Input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively to perform adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ; Step S5: Utilize a gradient descent-based backpropagation algorithm to continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model until the loss function L1,...,L... is reached. K L 级联 and L 融合 The success rate of the attack is higher than the threshold of the loss function, and the success rate P1,...,P K P 级联 and P 融合 The success rate of the attack exceeds the threshold. In step S5, the random perturbation noise added to the multimodal single-channel fused image is continuously updated, thereby continuously generating new adversarial example images. When the loss function L1,...,L... K L 级联 and L 融合 The success rate of the attack is higher than the threshold of the loss function, and the success rate P1,...,P K P 级联 and P 融合 When the success rate exceeds the attack success rate threshold, the corresponding adversarial sample image is saved as the K-type single-modality multimodal general adversarial sample image.
2. The multimodal general adversarial example generation method based on image fusion according to claim 1, characterized in that, The K single-modal single-channel images have different modalities, totaling K single-modal types. The K single-modal image task models correspond to the K single-modal types.
3. The multimodal general adversarial example generation method based on image fusion according to claim 2, characterized in that, The cascaded model is the model obtained by cascading the K single-modal image task models, and the fusion model is the model used to fuse the K single-modal models.
4. The multimodal general adversarial example generation method based on image fusion according to claim 3, characterized in that, The K single-modal image task models, the cascaded model, and the fusion model are all trained and optimized models with recognition accuracy higher than the optimization threshold.
5. A multimodal general adversarial example generation system based on image fusion, characterized in that, The system includes: The first processing unit is configured to: acquire K single-modal single-channel images, fuse the K single-modal single-channel images, and obtain a multimodal single-channel fused image; The second processing unit is configured to add random perturbation noise to the multimodal single-channel fused image to generate an adversarial example image of the multimodal single-channel fused image. The third processing unit is configured to: construct K single-modal image task models for the K single-modal single-channel images, and construct a cascaded model and a fusion model of the K single-modal image task models; The fourth processing unit is configured to: input the adversarial sample images into the K single-modal image task models, the cascaded model, and the fusion model respectively for adversarial attacks, and obtain the attack success rates P1,...,P1 respectively. K P 级联 and P 融合 And calculate the corresponding loss function L1,...,L K L 级联 and L 融合 ; The fifth processing unit is configured to: continuously perform adversarial attacks on the K single-modal image task models, the cascaded model, and the fusion model using a gradient descent-based backpropagation algorithm, until the loss function L1,...,L... K L 级联 and L 融合 The success rate of the attack is higher than the threshold of the loss function, and the success rate P1,...,P K P 级联 and P 融合 The success rate of the attack exceeds the threshold. The fifth processing unit is specifically configured to: continuously update the random perturbation noise added to the multimodal single-channel fused image, thereby continuously generating new adversarial example images, when the loss function L1,...,L... K L 级联 and L 融合 The success rate of the attack is higher than the threshold of the loss function, and the success rate P1,...,P K P 级联 and P 融合 When the success rate exceeds the attack success rate threshold, the corresponding adversarial sample image is saved as the K-type single-modality multimodal general adversarial sample image.
6. An electronic device, characterized in that, The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements the multimodal general adversarial sample generation method based on image fusion as described in any one of claims 1-4.
7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, which, when executed by a processor, implements the multimodal general adversarial sample generation method based on image fusion as described in any one of claims 1-4.