A targeted adversarial attack enhancement method based on channel feature selection

By using a channel feature selection method, Grad-CAM is used to generate saliency maps and optimize adversarial perturbations. This solves the problems of transferability and overfitting of targeted attacks in black-box scenarios and improves the attack effect of adversarial examples.

CN118736395BActive Publication Date: 2026-06-12BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2024-06-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing black-box scenarios, the transferability of adversarial examples is not effective in targeted attacks, and adversarial examples are prone to overfitting to alternative models, resulting in unsatisfactory attack results.

Method used

A targeted adversarial attack enhancement method based on channel feature selection is adopted. The Grad-CAM algorithm is used to generate a saliency map. By combining channel feature selection and loss optimization, the adversarial perturbation is optimized to improve transferability.

Benefits of technology

It significantly improves the transferability of adversarial examples in targeted attack scenarios, alleviates the problem of adversarial examples overfitting to alternative models, and enhances the effectiveness of black-box attacks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118736395B_ABST
    Figure CN118736395B_ABST
Patent Text Reader

Abstract

The application discloses a targeted adversarial attack enhancement method based on channel feature selection, belongs to the technical field of artificial intelligence security, and comprises the following steps: a heat map is calculated through a Grad-CAM algorithm, and a significant region is obtained; the significant region is supplemented to a significant map; a local image is obtained by randomly cutting the significant map; the local image is scaled to the same size as the original image; the original image and the local image are input into a CNN with the same adversarial perturbation; and a channel feature selection method is applied to optimize the adversarial perturbation. The application can improve the transferability of adversarial samples to a greater extent in the case of targeted attack; the application creatively applies model attention to targeted adversarial attack, so that the perturbation learns how to better transfer the significant features of the original image to the target category. The application focuses on improving the transferability of adversarial samples, thereby improving the success rate of black-box attack.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence security technology, specifically relating to a targeted adversarial attack enhancement method based on channel feature selection. Background Technology

[0002] An adversarial attack refers to the act of deceiving models such as deep neural networks (DNNs) by adding malicious inputs with carefully designed perturbations that are imperceptible to the human eye to benign samples. Such inputs are called adversarial examples, and the perturbations added to the original samples are called adversarial perturbations. Adversarial attacks have become a significant security issue in the practical applications of deep neural networks, such as autonomous driving and facial recognition.

[0003] Adversarial attacks can be categorized into white-box and black-box attacks based on whether the attacker knows the internal parameters of the attacked model. A white-box attack means the attacker has complete knowledge of the target model's structure and internal parameters, while a black-box attack means the attacker can only input data into the model and obtain its output, without knowing its structure or parameters. Existing adversarial attacks perform very well in white-box scenarios where the adversary has complete knowledge of the target model. However, most attacks are generally less effective in black-box scenarios where the adversary is unaware of the target model's settings, especially when the target model has defensive mechanisms.

[0004] In black-box settings, the most popular approaches are query-based and adversarial transferability-based methods. Query-based methods typically require numerous queries to the target model to approximate its gradient, making them difficult to implement when the target model has query limitations. Transferability-based methods offer a more practical approach by using a proxy model to generate highly transferable adversarial examples for black-box attacks. Based on the method of adversarial example generation, adversarial attacks can be categorized into generative and iterative methods. Generative methods: Attackers use generative models to generate adversarial examples. As long as the distribution of benign examples remains unchanged, the effectiveness of the adversarial examples depends on how the generator is trained. Many previous works on generative attacks aimed to learn a powerful generator that maps the distribution of clean images to the distribution of adversarial examples that can generalize well across different black-box models. However, generative methods often require significant resources to train a generator, making them very expensive in practice. Iterative methods can generate adversarial perturbations for specific samples to launch adversarial attacks. Compared to generative methods, iterative methods do not require the significant overhead of training a generator for a specific class beforehand, making them more feasible in practical applications. Although this method achieves good performance in white-box attack scenarios, the transferability of the adversarial examples generated by this method is limited in black-box scenarios.

[0005] The closest technical solution to this invention is the Self-Universality (SU) method. This method proposes that the higher the universality of the adversarial perturbation, the higher the transferability of the adversarial example. Therefore, performance in adversarial attack black-box scenarios can be improved by increasing the universality of the adversarial perturbation. In existing Universal Adversarial Perturbation (UAP) attack methods, a large number of other images are required to improve the universality of the perturbation, while the SU method does not require any additional images besides the original image. In the SU method, a block is first randomly cropped from the global image (GlobalInput), and then resizing it to the same size as the original image to obtain a local image (LocalInput). This completes a data augmentation process. In each iteration, a local image is obtained from the global image, and the local image is used to replace other images to improve the universality of the perturbation. After applying the same shared perturbations to both the global and local images, they are input into a convolutional neural network (CNN). The outputs of the global and local images are then used to obtain the classification loss using either CE loss or Logit loss. The classification loss shifts the feature distribution of the original image to the target class through the adversarial perturbation. The intermediate layers of the CNN receive the input feature information: global features and local features. Their cosine similarity is calculated, and this value serves as the feature similarity loss. The purpose of the feature similarity loss is to improve the generality of the adversarial perturbation. With the classification loss and feature similarity loss, the total loss can be calculated. The total loss is then inversely differentiated with respect to the perturbation, and the resulting gradient is used to update the adversarial perturbation, thus performing one iterative process on the perturbation.

[0006] Despite advancements in methods for improving the transferability of adversarial examples in black-box scenarios, they still suffer from several drawbacks. Generative methods have achieved good results, but these require pre-training a robust generator before generating perturbations. Furthermore, in many cases, a well-trained generator is only applicable to a specific target class, making generative methods costly in practice. One significant challenge for iterative methods is mitigating the problem of adversarial examples overfitting to surrogate models—adversarial perturbations trained on surrogate models perform poorly on other black-box models, especially when the structures of the black-box and surrogate models differ significantly. Many existing iterative methods have mitigated this problem to some extent, but their effectiveness in targeted attacks remains insufficient. Methods that improve the transferability of adversarial examples by increasing the generality of adversarial perturbations show promising results in untargeted attacks, but experiments have revealed limitations in targeted attacks. Therefore, mitigating the problem of adversarial examples overfitting to surrogate models in targeted attack scenarios remains a challenge in image adversarial attacks. In addition, there are some methods to improve the transferability of adversarial examples from the perspective of attention, but these methods are only applicable to non-targeted situations. Summary of the Invention

[0007] To address the issues of altering the feature distribution of adversarial examples from their original category to the target category through adversarial perturbations and the overfitting of adversarial examples to alternative models, this invention proposes a targeted adversarial attack enhancement method based on channel feature selection.

[0008] This invention is based on an iterative method that uses the model's attention to obtain a saliency map. The saliency map enables adversarial perturbations to better learn how to transfer the feature distribution of the original image to the target category. By using attention and channel feature selection methods, the transferability of adversarial examples is improved, further alleviating the problem of targeted adversarial examples overfitting to alternative models. This greatly enhances the transferability of adversarial examples in targeted attack scenarios and strengthens the effectiveness of black-box attacks.

[0009] The technical solution adopted by this invention to solve the technical problem is as follows:

[0010] This invention provides a targeted adversarial attack enhancement method based on channel feature selection, which mainly includes the following steps:

[0011] Step S1: Calculate the heat map using the Grad-CAM algorithm to obtain the salient regions, and fill in the salient regions to form a salient map;

[0012] Step S2: Randomly crop from the saliency map to obtain a local image, scale the local image to the same size as the original image, add the same adversarial perturbation to the original image and the local image, input them into the CNN, and apply the channel feature selection method to optimize the adversarial perturbation through loss.

[0013] Furthermore, in step S1, the feature is first approximated using the following formula. Importance of category t:

[0014]

[0015] Where Z represents the normalization constant, which makes represents the attention weight of the c-th channel in the k-th layer to category t; m and n represent the height and width of each channel in the k-th layer, respectively; f represents the substitution model, and f(x) represents the output value after inputting sample x into the substitution model;

[0016] Then, all channels in a feature layer are weighted and combined to obtain a heatmap:

[0017]

[0018] in This represents a heatmap of the k-th feature layer for category t.

[0019] Finally, the heat map After processing with the ReLU activation function, it is scaled to the same size as the original image.

[0020] Furthermore, in step S1, salient regions are first extracted from the heatmap, and then these salient regions are processed to generate a salient map x. sa .

[0021] Furthermore, when extracting significant regions from the heatmap, a pixel-level extraction method or a rectangular extraction method is used; the specific operation flow of the pixel-level extraction method is as follows:

[0022] In the heatmap, values ​​greater than the threshold ∈ b The part is retained, and its formula is:

[0023]

[0024] Where x b The effective part in x is the significant block. b The dimensions are the same as the original image dimensions, and i,j represent the position coordinates of each element in the image;

[0025] The rectangle extraction method described above will include all x bThe smallest rectangle of the effective portion is taken as the significant block.

[0026] Furthermore, the saliency map x sa The generation method is as follows: directly scale the long side of the salient block to the size of the original image, and fill the empty parts with uniform noise around the average color of the original image; or directly copy the salient block four times and place it in the top, bottom, left and right parts of the image, and fill the empty parts with uniform noise around the average color of the original image.

[0027] Furthermore, in step S2, during the adversarial perturbation generation stage, the saliency map corresponding to the original image is first randomly cropped and scaled to make its size the same as the original image, in order to increase the diversity of the local image and thus increase the transferability of the adversarial example.

[0028] Furthermore, the RandomResizedCrop function from the torchvison library is used to randomly crop and scale the saliency map; the RandomResizedCrop function requires the cropping range parameter s = {s l ,s int}, s l s represents the lower limit of the percentage of the randomly cropped portion relative to the x-area of ​​the original image. int s represents the median value of the area x of the original image occupied by the randomly cropped portion, i.e., s l +s int The upper bound is represented by RRC(x,s), which represents the random cropping and scaling operation on the original image x.

[0029] Furthermore, in step S2, the specific operation flow of the channel feature selection method is as follows:

[0030] Assumption and These represent the input and output of the original image x into model f at layer l, respectively. and Let represent the number of channels, channel height, and channel width of the output data of the l-th layer of the model, respectively. Then, the formula for the channel feature selection method is:

[0031]

[0032] in P represents the value of the i-th channel of the output of the original image x input to the model f at the l-th layer. An array of n elements, where P ~ Bernoulli(p b ), p b This represents a predefined constant.

[0033] Furthermore, in step S2, the loss function is first defined using the obtained saliency map, channel feature selection method, and DI method:

[0034]

[0035] in This indicates an alternative model using channel feature selection at layer l, where parameter λ defines the weights in the second half of the equation; J represents the loss function; x sa Represents a saliency map;

[0036] The objective function is:

[0037]

[0038] Where ∈ represents the upper bound of the perturbation;

[0039] The loss function L(δ) is used to guide the iterative update process against the perturbation δ. The update process for each iteration is as follows:

[0040]

[0041] δ i+1 =δ i -α×sign(g i+1 ),

[0042] δ i+1 =Clip x,∈ (δ i+1 ).

[0043] Furthermore, the loss function J adopts either CE loss or Logit loss.

[0044] The beneficial effects of this invention are:

[0045] Many existing iterative methods have alleviated the problem of adversarial examples overfitting to alternative models to some extent, but their performance is still insufficient in targeted attacks. Furthermore, some existing methods improve the transferability of adversarial examples by addressing model attention, but these methods are only applicable to non-targeted attacks. Compared to existing iterative attack methods, this invention provides a targeted adversarial attack enhancement method based on channel feature selection, which can significantly improve the transferability of adversarial examples in targeted attacks. This invention innovatively applies model attention to targeted adversarial attacks, enabling perturbations to learn how to better transfer salient features of the original image to the target category. Compared with existing technologies, this invention improves the technical effect in the following two aspects:

[0046] (1) A heatmap is calculated using attention to obtain the region of greatest interest to the model, thereby generating a saliency map. In the context of targeted adversarial attacks, applying a local image cropped from the saliency map as "noise" to the adversarial perturbation can help the perturbation better learn how to transfer features from the original image to the target category, thus improving the transferability of adversarial examples to some extent. Furthermore, in the context of targeted adversarial attacks, the targeted adversarial attack enhancement method based on channel feature selection provided in this invention is the first attention-based method to improve the transferability of adversarial examples.

[0047] (2) Based on the channel feature selection method, the problem of adversarial examples overfitting to white-box substitution models can be largely alleviated, increasing the transferability of adversarial attacks. This invention finds that in targeted adversarial attack scenarios, using different iterative attack algorithms will cause the generation process of adversarial perturbations to focus on a few specific features. This reduces the generalization ability of adversarial examples on other black-box models, making them only applicable to substitution models. The channel feature selection method can alleviate this problem.

[0048] In summary, the targeted adversarial attack enhancement method based on channel feature selection provided by this invention focuses on improving the transferability of adversarial samples, thereby increasing the success rate of black-box attacks. Attached Figure Description

[0049] Figure 1 This is a schematic diagram of a targeted adversarial attack enhancement method based on channel feature selection provided by the present invention.

[0050] Figure 2 This is a rendering of a heatmap generated using the ResNet50 model in this invention.

[0051] Figure 3 These are different saliency maps obtained by the present invention.

[0052] Figure 4 This is an activation level graph.

[0053] Figure 5 This is the result of a standard attack model.

[0054] Figure 6 This represents the attack results of the robust model. Detailed Implementation

[0055] The present invention will be further described in detail below with reference to the accompanying drawings.

[0056] This invention provides a targeted adversarial attack enhancement method based on channel feature selection. The key idea is to utilize attention to better shift the feature distribution of the original image to the target category using adversarial perturbations, and to further mitigate the problem of targeted adversarial examples overfitting to alternative models using channel selection. In short, an attention mechanism is first used to extract the most relevant regions (i.e., salient regions) of the CNN in the original image through a heatmap. The local images of the salient regions and the original image are then fed into the CNN with the same adversarial perturbation. Furthermore, a channel selection mechanism is proposed. During CNN inference, channels of a certain feature layer are randomly selected. This prevents gradients from focusing excessively on specific channels, thus mitigating the problem of adversarial perturbations overfitting to alternative models.

[0057] See Figure 1 As shown, the present invention provides a targeted adversarial attack enhancement method based on channel feature selection, which specifically includes the following steps:

[0058] Step S1: Generate saliency map;

[0059] The heatmap is calculated using the Grad-CAM algorithm to obtain salient regions, which are then filled into a salient map. The specific steps are as follows:

[0060] Step S1.1: Calculation of the heat map;

[0061] Unlike traditional image classification methods based on hand-designed features, deep learning-based image classifiers are renowned for their ability to automatically extract discriminative features from images. This means that CNNs inherently possess some weak attention mechanisms. The Grad-CAM algorithm assumes that convolutional layers naturally preserve the spatial information lost in fully connected layers. Therefore, this invention can expect the last few convolutional layers to achieve the optimal trade-off between high-level semantics and detailed spatial information. Although different models may focus on different features for the same image, they often focus on similar salient features. For example, for an image of a dog, different models might all focus on the dog's nose, eyes, and mouth. In targeted attack scenarios, adversarial perturbations should learn better how to transfer the key feature distribution of the original image to the target category, thus improving the transferability of adversarial examples between different black-box models.

[0062] Therefore, to obtain the region of greatest interest to the model, this invention needs to determine the importance of different features in the model for the model's decision-making, i.e., the model's attention. Each channel of the feature layer can be viewed as a feature extractor. This invention can approximate the feature extraction using the following formula. The importance of (channel c in layer k) to category t:

[0063]

[0064] Where Z represents the normalization constant, which makes represents the attention weight of the c-th channel in the k-th layer to category t; m and n represent the height and width of each channel in the k-th layer, respectively; f represents the substitution model, and f(x) represents the output value after inputting sample x into the substitution model.

[0065] Based on the calculated attention weights, a heatmap can be generated. The heatmap reflects the region the model focuses on for a particular category, i.e., the key features. Specifically, this invention weights and combines all channels in a feature layer to obtain a heatmap:

[0066]

[0067] in This represents the heatmap of the k-th feature layer with respect to category t. It's important to note that the heatmap at this point... The resolution of the image is different from that of the original image (e.g., the channel size in the last layer of the VGG model is 14*14), so it needs to be scaled to the same size as the original image. Here, the present invention uses the ReLU activation function for processing because the present invention only needs to focus on the part of the heatmap that is greater than zero, that is, the part that is positively correlated with the prediction of class t.

[0068] In this embodiment, the effect of generating a heatmap using the ResNet50 model is shown in the image below. Figure 2 As shown, the left image is the original image, and the right image is the effect of overlaying a heatmap. The ResNet50 model correctly classified it as a dog, which shows that the ResNet50 model pays special attention to the features of the dog in the image.

[0069] Step S1.2: Saliency map generation;

[0070] Previous research suggested that increasing the generality of adversarial perturbations could correspondingly improve the transferability of adversarial examples. Many studies now focus on improving the generality of adversarial perturbations. Simply put, for adversarial examples, the original image is more like noise; that is, the correlation between the predicted value of the adversarial example and the adversarial perturbation is greater than the correlation between the predicted value of the adversarial example and the original image. Training the adversarial perturbation with different images can increase its generality. The Supremacy (SU) method utilizes this theory, using local images as noise to improve the generality of the adversarial perturbation. However, this invention's experiments show that the generality of adversarial perturbations only applies to non-targeted attack scenarios. In targeted attack scenarios, using different images to train the adversarial perturbation actually reduces its attack effectiveness. This invention finds that using saliency maps to train adversarial perturbations can significantly improve the transferability of adversarial examples. This invention argues that this allows the adversarial perturbation to learn how to transfer features from the original image to features of the target class in a more transferable way. This is because, although different models may focus on different features of the same image, the most critical features they focus on are similar. The effectiveness of the SU method stems from the fact that local images are likely to contain salient features.

[0071] After obtaining a scaled heatmap with the same size as the original image, this invention allows for the extraction of salient portions from the heatmap to form a salient map. Specifically, salient areas are first extracted from the heatmap. One extraction method involves selecting salient regions from the heatmap that are larger than a threshold ∈ […]. b The relevant part is retained, and its specific formula is as follows:

[0072]

[0073] Where x b The effective part in x is the significant block. b The dimensions are the same as the original image dimensions, and i,j represent the position coordinates of each element in the image. Additionally, a rectangular cropping method can be used to extract all elements containing x... b The smallest rectangle of the effective portion is taken as the salient block. These two methods are suitable for different situations. The first method, pixel-level cropping, works better when the image contrast is low and the texture is simple. The second method works better when the image contrast is low or the texture is complex. The threshold ∈ [insert threshold here] in this invention is tested. b The attack effect is best when the value is 0.5.

[0074] Then, the present invention operates on the salient blocks to generate a salient map x. saRegarding the generation method of saliency maps, this invention has explored several different modes. For example, the long side of the saliency block is directly scaled to the size of the original image, and the missing parts are filled with uniform noise around the average color of the original image; or the saliency block is directly copied four times and placed in the top, bottom, left, and right parts of the image, and the missing parts are filled with uniform noise around the average color of the original image.

[0075] In this embodiment, different saliency maps x are obtained. sa like Figure 3 As shown, the original image on the left has low contrast and simple texture, so pixel-level cropping can be used; the image on the right has high contrast and complex texture, so rectangular cropping is used.

[0076] Step S2: Disturbance generation;

[0077] The process involves randomly cropping a local image from the saliency map, scaling the local image to the same size as the original image, applying the same adversarial perturbation to both the original and local images, and then inputting them into a CNN. A channel feature selection method is then applied, and the adversarial perturbation is optimized using a loss function. The specific steps are as follows:

[0078] In the adversarial perturbation generation stage, the present invention first processes the saliency map x corresponding to the original image. sa Random cropping and scaling are performed to make the image the same size as the original. In practice, this invention uses the `RandomResizedCrop` function from the `torchvison` library. Through the above data augmentation steps, not only can the diversity of local images be increased, thereby increasing the transferability of adversarial examples, but also, due to the salient image x... sa It includes the region that the model is most interested in, allowing adversarial perturbations to better learn how to transfer the feature distribution of the original image to the target category. The RandomResizedCrop function requires the cropping range parameter s = {s l ,s int}, where s l s represents the lower limit of the percentage of the randomly cropped portion relative to the x-area of ​​the original image. int This represents the midpoint between the upper and lower limits of the x-area of ​​the randomly cropped portion in the original image, that is, s. l +s int The upper bound is represented by RRC(x,s), which in this invention represents the random cropping and scaling operation on the original image x.

[0079] This invention, through analysis of the activation level of the penultimate layer of the model, reveals that adversarial examples generated using different adversarial attack algorithms often focus only on specific channels, such as... Figure 4As shown, "weight" represents the weight of a specific category for each channel, and all channels are arranged in descending order of this weight. "Clean image activation" represents the activation status after the original image is input into the model. The method of this invention is compared with three other adversarial algorithms: FGSM (Fast Gradient Sign Method), DTMI, and SU (Self-Universality). The results are as follows... Figure 4 As shown, although the specific values ​​differ significantly, the activation shapes and peak positions of these three adversarial algorithms—FGSM, DTMI, and SU—are remarkably similar. This indicates that these adversarial algorithms tend to increase the activation levels of specific channels to deceive the model. This approach is effective in white-box attacks, but in black-box attacks, because different models may focus on different features, the adversarial examples may overfit to the white-box substitute model, resulting in low transferability of the adversarial examples.

[0080] The specific construction process of the adversarial algorithm DTMI is as follows:

[0081] Let f represent the white-box substitution model and v represent the attacked black-box model. This represents a benign sample, also called the original image. C, H, and W represent the number of channels, height, and width of image x, respectively, and the category of x is y. s K represents the total number of categories. Specify a specific target category y. t ≠y s The goal of targeted attacks is to generate adversarial examples x by using white-box replacement models. adv This makes v(x) the result of inputting it into the attacked black-box model. adv )=y t This invention uses L ∞ The paradigm is used to limit the degree of resistance to perturbations, i.e., ||x adv -x|| ∞ =||δ|| ∞ ≤∈, where δ represents the adversarial perturbation and ∈ represents a constant constraining the perturbation. Given the classification loss (e.g., Logit loss) of the white-box surrogate model f, the formula for the I-FGSM (iterative FGSM algorithm) attack algorithm in targeted scenarios is as follows:

[0082] δ0=0, g0=0,

[0083]

[0084] δ i+1 =δ i -α×sign(g i+1 ),

[0085] δ i+1 =clip x,∈ (δ i+1 ),

[0086] Where δ0 and g0 represent the initial perturbation and gradient, respectively, δ i and g i Let δ represent the adversarial perturbation and gradient at the i-th iteration, respectively. i+1 and g i+1 Let represent the adversarial perturbation and gradient at the (i+1)th iteration, where i = [0, ..., I-1] and I represents the maximum number of iterations. α represents the step size for each iteration. Clip x,∈ The (·) function projects the adversarial perturbation δ onto the vicinity of the original image x to satisfy L ∞ Norm constraints. Specifically, the I-FGSM algorithm first initializes δ0 and g0 to 0, then calculates the gradient of the adversarial perturbation δ with respect to the loss, and updates the adversarial perturbation δ according to the direction of the gradient. In the case of targeted attacks, it updates from the direction of loss reduction. Finally, the L norm constraint is satisfied through the Clip operation. ∞ Norm limitations. The I-FGSM algorithm is very effective in white-box attack scenarios, but performs poorly in black-box attack scenarios. This is not only because the I-FGSM algorithm is prone to getting trapped in local maxima, but also because the adversarial examples it generates largely overfit the white-box surrogate model, making it difficult to transfer these adversarial examples to other models. Existing Diverse Input (DI), Translation-Invariant (TI), and Momentum (MI) methods alleviate the problem of adversarial examples overfitting to surrogate models to some extent. Therefore, this invention uses an algorithm combining these three methods (DI, TI, and MI) with I-FGSM as a benchmark. For convenience, this invention refers to the combined method as the DTMI method. The formula for updating the gradient in the DTMI method is as follows:

[0087]

[0088] Where T(x+δ) i The function ,p) randomly adjusts the size and padding with probability p, similar to the DI method; W represents the predefined convolution kernel, similar to the TI method; μ represents the parameter used in the MI method.

[0089] Based on the above phenomena, this invention proposes a channel feature selection method, which enables adversarial algorithms to avoid over-focusing on certain specific features during the generation of adversarial examples, thereby alleviating the problem of adversarial examples overfitting to alternative models and improving the transferability of adversarial examples.

[0090] Specifically, assuming and These represent the input and output of the original image x into model f at layer l, respectively. and Let represent the number of channels, channel height, and channel width of the output data of the l-th layer of the model, respectively. Then, the formula for the channel feature selection method is as follows:

[0091]

[0092] in P represents the value of the i-th channel of the output of the original image x input to the model f at the l-th layer. An array of n elements, where P ~ Bernoulli(p b ), p b This represents a predefined constant.

[0093] This invention proposes a channel feature selection method similar to dropout in the model training process. Dropout helps alleviate the problem of the trained model overfitting to the test data, while the channel feature selection method of this invention helps alleviate the problem of the trained adversarial perturbation overfitting to the alternative model.

[0094] This invention utilizes the obtained saliency map x sa In addition to channel feature selection methods and DI methods, the loss function is defined as:

[0095]

[0096] in This indicates that an alternative model using the channel feature selection method is used in layer l. The parameter λ is used to define the weights of the second half of the equation, and the loss function J can be either CE loss or Logit loss.

[0097] The objective function of this invention is:

[0098]

[0099] Where ∈ represents the upper bound of the perturbation.

[0100] This invention uses a loss function L(δ) to guide the iterative update process against the perturbation δ. The update process for each iteration is as follows:

[0101]

[0102] δ i+1 =δ i -α×sign(g i+1 ),

[0103] δi+1 =Clip x,∈ (δ i+1 ).

[0104] This invention uses four different models—ResNet50, DenseNet121, VGGNet16, and Inception-v3—as alternative models to generate adversarial perturbations. Black-box models include ResNet50, DenseNet121, VGGNet16, Inception-v3, vgg19_bn, mobilenet_v3_large, PNASNet, and SENet.

[0105] In addition, this invention also tested five robust models: adv_inception_v3, ens_adv_inception_resnet_v2, DeepAugment_AugMix, resnet50_trained_on_SIN, and resnet50_trained_on_SIN_and_IN. This invention uses the ImageNet-compatible dataset, which was first used in the NI PS 2017 adversarial attack and defense competition. It consists of 1000 images and corresponding labels, designed for targeted adversarial attacks.

[0106] This invention uses Targeted Attack Success Rate (TASR) as a measure of attack effectiveness on a black-box model, that is, the proportion of adversarial examples successfully classified as the target category by the black-box model. Therefore, methods with higher TASR values ​​can generate adversarial examples with better targeted transferability.

[0107] The results of a normal model attack are as follows Figure 5As shown. First, it can be observed that the attack performance of this invention is significantly better than the SU attack method under different alternative models and loss functions. Notably, when ResNet50 is used as the alternative model, this invention achieves an average improvement of 19.69% and 15.74% over SU in terms of CE and Logit Loss, respectively. Second, it can be seen that the attack performance when using ResNet50 and DenseNet121 as alternative models is better than that when using VGGNet16 and Inception-v3 as alternative models. ResNet50 and DenseNet121 contain skip connection structures, which can largely alleviate the gradient vanishing problem during increasing iterations, making the adversarial examples more transferable. Third, when using Logit as the loss function, the attack effect is generally better than when using CE as the loss function. This is because CELoss uses the softmax function, which is more prone to gradient vanishing during the differentiation process, resulting in a worse effect than using Logit loss. However, when using this invention, the difference between choosing CE and Logit as the loss function is smaller than when using the SU method. This is because, compared to the SU method, this invention better alleviates the gradient vanishing problem, thus mitigating the shortcomings of the CE loss function.

[0108] Robust model attack results are as follows Figure 6 As shown. This invention considers two adversarially trained models, the DeepAugment (DA) method and a stylized ImageNet model. This invention uses ResNet50 and DenseNet121 to generate adversarial examples to attack these robust models. The results of different methods after 500 iterations under different alternative models, ∈ values, and loss functions are listed. (See attached diagram.) Figure 6 As can be seen, the present invention has made a significant improvement over the SU method. Specifically, when using CE and Logit loss, the method of the present invention can achieve a huge improvement of up to 20.8% and 29.6%, respectively.

[0109] These experiments demonstrate that, compared to the best current methods, the adversarial examples generated by this invention not only significantly improve the effectiveness of attacking ordinary models, but also have better generalization ability for robust models.

[0110] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A targeted adversarial attack enhancement method based on channel feature selection, characterized in that, Includes the following steps: Step S1: Calculate the heat map using the Grad-CAM algorithm to obtain the salient regions, and fill in the salient regions to form a salient map; Step S2: Randomly crop from the saliency map to obtain a local image, scale the local image to the same size as the original image, add the same adversarial perturbation to the original image and the local image, input them into the CNN, and apply the channel feature selection method to optimize the adversarial perturbation through loss. First, using the obtained saliency map, channel feature selection method, and DI method, the loss function is defined as: , in Indicates in The layer uses an alternative model to the channel feature selection method, with parameters... Used to define the weights of the second half of the expression; Represents the loss function; Represents a saliency map; Indicates resistance to disturbance; The specific operation process of the channel feature selection method is as follows: Assumption and These represent the original images. Input Model In the Layer inputs and outputs, , , and They represent the model number 1, 2, 3, 4, 5, 6, 7, 8, 9 Given the number of channels, channel height, and channel width of the layer output data, the formula for the channel feature selection method is: , in Represents the original image Input Model In the The output of the first layer The value of each channel, Indicates a having An array of n elements, and , This represents a predefined constant.

2. The targeted adversarial attack enhancement method based on channel feature selection according to claim 1, characterized in that, In step S1, the feature is first approximated using the following formula. For category Importance: , in Denotes the normalization constant, which makes ; Indicates the first The first in the layer Channel to Category Attention weights; and They represent The height and width of each channel in the layer; Indicates an alternative model, Indicates that the sample The output value after being input into the alternative model; Then, all channels in a feature layer are weighted and combined to obtain a heatmap: , in Indicates the first Layer feature layer pairs of categories Heatmap; Finally, the heat map After processing with the ReLU activation function, it is scaled to the same size as the original image.

3. The targeted adversarial attack enhancement method based on channel feature selection according to claim 1, characterized in that, In step S1, firstly, significant regions are extracted from the heatmap, and then these significant regions are processed to generate a significant map. .

4. The targeted adversarial attack enhancement method based on channel feature selection according to claim 3, characterized in that, When extracting significant regions from a heatmap, either a pixel-level extraction method or a rectangular extraction method is used; the specific operation flow of the pixel-level extraction method is as follows: The value greater than the threshold in the heatmap The part is retained, and its formula is: , in The effective part is the significant block. The dimensions are the same as the original image dimensions. This represents the position coordinates of each element in the diagram; The rectangle extraction method will include all The smallest rectangle of the effective portion is taken as the significant block.

5. The targeted adversarial attack enhancement method based on channel feature selection according to claim 3, characterized in that, The salience map The generation method is as follows: directly scale the long side of the salient block to the size of the original image, and fill the empty parts with uniform noise around the average color of the original image; or directly copy the salient block four times and place it in the top, bottom, left and right parts of the image, and fill the empty parts with uniform noise around the average color of the original image.

6. The targeted adversarial attack enhancement method based on channel feature selection according to claim 1, characterized in that, In step S2, during the adversarial perturbation generation stage, the saliency map corresponding to the original image is first randomly cropped and scaled to make its size the same as the original image, in order to increase the diversity of the local image and thus increase the transferability of the adversarial example.

7. The targeted adversarial attack enhancement method based on channel feature selection according to claim 6, characterized in that, The RandomResizedCrop function from the torchvison library is used to randomly crop and scale the saliency map; the RandomResizedCrop function requires a cropping range parameter as input. , This indicates that the randomly cropped portion represents a portion of the original image. The lower limit of the percentage of area. This indicates that the randomly cropped portion represents a portion of the original image. The midpoint between the upper and lower limits of the area, i.e. To indicate the upper bound, use Indicates the original image Random cropping and scaling operations.

8. The targeted adversarial attack enhancement method based on channel feature selection according to claim 1, characterized in that, In step S2, the objective function is: , in, Indicates the upper bound of the disturbance; Using loss function To guide the counter-disturbance The iterative update process is as follows: , , , 。 9. The targeted adversarial attack enhancement method based on channel feature selection according to claim 1, characterized in that, The loss function used is either CE loss or Logit loss.