A method and system for generating adversarial examples based on self-attention mechanism
By constructing a generator with a self-attention mechanism and a multi-convolutional discriminator model, adversarial examples capable of deceiving the target model are generated, solving the problem of deep neural networks being vulnerable to attacks and improving the model's security and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
- Filing Date
- 2024-07-18
- Publication Date
- 2026-06-30
AI Technical Summary
Deep neural networks are vulnerable to adversarial attacks, resulting in insufficient security and robustness, making them difficult to apply in safety-critical systems.
We construct a generator model and a multi-convolution discriminator model based on a self-attention mechanism. By designing a loss function and performing alternating optimization, we generate adversarial examples that can deceive the target model. We also utilize the self-attention mechanism to capture long-distance dependencies and high-dimensional features of images.
It improves the security and robustness of deep learning models, enhances the concealment of adversarial examples and the accuracy of discriminators, and reduces the risk of models being misled.
Smart Images

Figure CN119089973B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of artificial intelligence security, specifically relating to a method and system for generating adversarial examples based on a self-attention mechanism. Background Technology
[0002] Deep neural networks are vulnerable to adversarial examples, which are difficult to distinguish with the naked eye but can cause deep neural networks to produce incorrect outputs, thus posing serious safety risks. The existence of adversarial examples hinders the practical application of deep learning in safety-critical systems. For example, in autonomous driving systems, adversarial examples in the input image may prevent the model from correctly recognizing traffic signs or detecting pedestrians, leading to serious traffic accidents. Therefore, in systems with high safety requirements, more robust neural network models are needed.
[0003] Due to their black-box nature, traditional software testing methods are difficult to apply directly to deep learning models. Adversarial testing is a testing method suitable for deep neural networks. This method intentionally introduces perturbations into the input data and observes the changes in the model's output before and after the perturbation, thereby evaluating the model's stability and robustness to small changes in the input data and helping to identify potential vulnerabilities in the model.
[0004] Computer vision is one of the fields that heavily relies on deep learning. With increasing applications of deep learning techniques in image recognition tasks, adversarial examples incorporating perturbations have become a primary threat. In image classification, adversarial examples refer to intentionally synthesized image samples where attackers add minute, imperceptible perturbations to the original image. Attackers then use these synthesized images as input to a classifier, causing it to misclassify and achieving their attack objective.
[0005] In recent years, many scholars have begun to focus on adversarial examples in image neighborhoods. Research on adversarial examples is particularly important for image classification for two reasons: first, it helps to deepen the understanding of the working principles and prediction mechanisms of deep learning models. By analyzing the generation process of adversarial examples, weaknesses in the model's decision-making process can be revealed, thereby improving the model's interpretability; second, computer vision is a massive, rapidly developing field with a wide range of applications, making it more vulnerable to adversarial attacks. Therefore, research on adversarial examples helps improve the robustness, security, and interpretability of deep learning models, providing security guarantees for their real-world applications. Summary of the Invention
[0006] Purpose of the invention: In order to ensure the security, reliability and robustness of deep learning models in real-world applications, this invention proposes an adversarial example generation method and system based on a self-attention mechanism.
[0007] Technical solution: An adversarial example generation method based on self-attention mechanism, comprising the following steps:
[0008] A generator model is constructed, comprising an encoder and a decoder. The encoder comprises a first input layer, unit 1, unit 2, unit 3, and unit 4 connected in sequence. Unit 1 consists of a first convolutional layer, a first batch normalization layer, and a first Leaky ReLU layer; unit 2 consists of a second convolutional layer, a second batch normalization layer, and a second Leaky ReLU layer; unit 3 consists of a third convolutional layer, a third batch normalization layer, and a third Leaky ReLU layer; and unit 4 consists of a fourth convolutional layer, a first self-attention layer, a fourth batch normalization layer, and a fourth Leaky ReLU layer. The output of unit 1 is connected to the output of unit 3 via a residual connection module. The decoder comprises units 5, 6, 7, and 8 connected in sequence. Unit 5 consists of a fifth convolutional layer, a second self-attention layer, a fifth batch normalization layer, and a fifth Leaky ReLU layer; unit 6 consists of a first deconvolutional layer, a sixth batch normalization layer, and a sixth Leaky ReLU layer; and unit 7 consists of a second deconvolutional layer, a seventh batch normalization layer, and a seventh Leaky ReLU layer. Unit 7 is composed of the third deconvolutional layer, the seventh batch normalization layer, and the seventh LeakyReLU layer.
[0009] A multi-convolutional discriminator model is constructed, comprising a second input layer, a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer, and a ninth convolutional layer connected in sequence; the second input layer is used to input the original data samples and the adversarial samples generated by the generator model;
[0010] The loss function is designed, which consists of a misjudgment loss function, an adversarial loss function, and a hinge loss function. The misjudgment loss function is used to guide the generator model to train in the direction of adversarial examples. The adversarial loss function is used to maintain the dynamic balance between the performance of the generator and the discriminator. The hinge loss function is used to limit the magnitude of the adversarial perturbation in the output of the generator model.
[0011] Guided by the loss function, the generator model and the multi-convolutional discriminator model are alternately optimized using the training dataset for adversarial training. Finally, after the performance of the generator model and the multi-convolutional discriminator model reaches a dynamic balance, the trained generator model is saved.
[0012] The original sample is input into the trained generator model to generate adversarial perturbations. The generated adversarial perturbations are then superimposed on the original sample to obtain adversarial examples.
[0013] Furthermore, the mapping process from the convolutional feature map output by the convolutional layer to the self-attention feature map output by the self-attention layer can be represented as follows:
[0014] The convolutional feature map output by the convolutional layer is X = (x1, x2, x3, ..., x...). n ), X∈R C×N Where C and N are the number of channels and the number of feature locations in the input feature map, respectively;
[0015] First, a 1x1 convolution kernel is used to linearly transform the values of the C channels of each pixel, transforming the image features into two feature spaces f and g, respectively, where f(x) = W. f x, g(x) = W g x, the feature operation output of position i is transposed and multiplied with the feature space output of position j, and then normalized to obtain the attention weight between the two positions, specifically expressed as:
[0016]
[0017] In the formula, β j,i This represents the degree of dependence of the model on the i-th position during the synthesis of the j-th region of an image within the self-attention framework; s ij This represents the weight between the i-th and j-th positions in the image, which is used for subsequent attention weight calculation;
[0018] The attention map obtained after the above operations is multiplied by the feature map in the feature space h, and then processed by the weight matrix W. v The final output obtained from the calculation is O = (O1, O2, ..., O2). j ,...,O N ),in:
[0019]
[0020] In the formula, W h and W v The weight matrix is obtained through 1x1 convolution kernel operation, where v represents the weight of x. i Using the weight matrix W v The transformation operation performed, h(x) i ) represents x i Through the weight matrix W h The transformed value;
[0021] Finally, O j Multiplying by a learnable scalar λ and then superimposing the original input feature map yields a feature map that incorporates attention:
[0022] y i =λO i +x i .
[0023] Furthermore, the misjudgment loss function is expressed as:
[0024] L adv =E x λ f (x+G(x),t)
[0025] Where f is the target attack model, t is the true label of the input data x, and λ f This reflects the ability of the target attack model f to identify synthetic samples, E x G(x) represents the expected value of the input data x, and G(x) represents the adversarial perturbation generated by the generator model.
[0026] Furthermore, the adversarial loss function is expressed as:
[0027] L GAN =E x logD(x)+E x log(1-D(x+G(x)))
[0028] Among them, E x G(x) represents the expected value of the input data x, G(x) represents the adversarial perturbation generated by the generator model, and D(x) represents the probability that the discriminator model judges the sample as true.
[0029] Furthermore, the hinge loss function is expressed as:
[0030] L hinge =E x max(0,||G(x)||2-c)
[0031] Wherein, the constant c is the optimization boundary of the perturbation, G(x) represents the adversarial perturbation generated by the generator model, and ||·||2 represents the l2 norm of the adversarial perturbation.
[0032] This invention discloses an adversarial example generation system based on a self-attention mechanism, comprising:
[0033] The trained generator model is used to generate adversarial perturbations for the original samples;
[0034] The adversarial example generation module is used to superimpose the generated adversarial perturbations onto the original samples to obtain adversarial examples;
[0035] The trained generator model is obtained according to the following steps:
[0036] A generator model is constructed, comprising an encoder and a decoder. The encoder comprises a first input layer, unit 1, unit 2, unit 3, and unit 4 connected in sequence. Unit 1 consists of a first convolutional layer, a first batch normalization layer, and a first Leaky ReLU layer; unit 2 consists of a second convolutional layer, a second batch normalization layer, and a second Leaky ReLU layer; unit 3 consists of a third convolutional layer, a third batch normalization layer, and a third Leaky ReLU layer; and unit 4 consists of a fourth convolutional layer, a first self-attention layer, a fourth batch normalization layer, and a fourth Leaky ReLU layer. The output of unit 1 is connected to the output of unit 3 via a residual connection module. The decoder comprises units 5, 6, 7, and 8 connected in sequence. Unit 5 consists of a fifth convolutional layer, a second self-attention layer, a fifth batch normalization layer, and a fifth Leaky ReLU layer; unit 6 consists of a first deconvolutional layer, a sixth batch normalization layer, and a sixth Leaky ReLU layer; and unit 7 consists of a second deconvolutional layer, a seventh batch normalization layer, and a seventh Leaky ReLU layer. Unit 7 is composed of the third deconvolutional layer, the seventh batch normalization layer, and the seventh LeakyReLU layer.
[0037] A multi-convolutional discriminator model is constructed, comprising a second input layer, a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer, and a ninth convolutional layer connected in sequence; the second input layer is used to input the original data samples and the adversarial samples generated by the generator model;
[0038] The loss function is designed, which consists of a misjudgment loss function, an adversarial loss function, and a hinge loss function. The misjudgment loss function is used to guide the generator model to train in the direction of adversarial examples. The adversarial loss function is used to maintain the dynamic balance between the performance of the generator and the discriminator. The hinge loss function is used to limit the magnitude of the adversarial perturbation in the output of the generator model.
[0039] Guided by the loss function, the generator model and the multi-convolutional discriminator model are alternately optimized using the training dataset for adversarial training. Finally, the generator model and the multi-convolutional discriminator model reach a dynamic balance in performance, resulting in a well-trained generator model.
[0040] Furthermore, the mapping process from the convolutional feature map output by the convolutional layer to the self-attention feature map output by the self-attention layer can be represented as follows:
[0041] The convolutional feature map output by the convolutional layer is X = (x1, x2, x3, ..., x...). n ), X∈R C×N Where C and N are the number of channels and the number of feature locations in the input feature map, respectively;
[0042] First, a 1x1 convolution kernel is used to linearly transform the values of the C channels of each pixel, transforming the image features into two feature spaces f and g, respectively, where f(x) = W. f x, g(x) = W g x, the feature operation output of position i is transposed and multiplied with the feature space output of position j, and then normalized to obtain the attention weight between the two positions, specifically expressed as:
[0043]
[0044] In the formula, β j,i This represents the degree of dependence of the model on the i-th position during the synthesis of the j-th region of an image within the self-attention framework; s ij This represents the weight between the i-th and j-th positions in the image, which is used for subsequent attention weight calculation;
[0045] The attention map obtained after the above operations is multiplied by the feature map in the feature space h, and then processed by the weight matrix W. v The final output obtained from the calculation is O = (O1, O2, ..., O2). j ,...,O N ),in:
[0046]
[0047] In the formula, W h and W v The weight matrix is obtained through 1x1 convolution kernel operation, where v represents the weight of x. i Using the weight matrix W v The transformation operation performed, h(x) i ) represents x i Through the weight matrix W h The transformed value;
[0048] Finally, O j Multiplying by a learnable scalar λ and then superimposing the original input feature map yields a feature map that incorporates attention:
[0049] y i =λO i +x i .
[0050] Furthermore, the misjudgment loss function is expressed as:
[0051] L adv =E x λ f (x+G(x),t)
[0052] Where f is the target attack model, t is the true label of the input data x, and λ f This reflects the ability of the target attack model f to identify synthetic samples, E x G(x) represents the expected value of the input data x, and G(x) represents the adversarial perturbation generated by the generator model.
[0053] Furthermore, the adversarial loss function is expressed as:
[0054] L GAN =E x logD(x)+E x log(1-D(x+G(x)))
[0055] Among them, E x G(x) represents the expected value of the input data x, G(x) represents the adversarial perturbation generated by the generator model, and D(x) represents the probability that the discriminator model judges the sample as true.
[0056] Furthermore, the hinge loss function is expressed as:
[0057] L hinge =E x max(0,||G(x)||2-c)
[0058] Wherein, the constant c is the optimization boundary of the perturbation, G(x) represents the adversarial perturbation generated by the generator model, and ||·||2 represents the l2 norm of the adversarial perturbation.
[0059] Beneficial effects: Compared with the prior art, the present invention has the following advantages:
[0060] (1) This invention utilizes a convolutional neural network to construct a generator model, which can effectively process image data; and introduces a self-attention mechanism into the generator model to capture long-distance dependencies and high-dimensional features in the latent space, thereby improving the concealment of adversarial examples;
[0061] (2) The present invention uses four convolutional layers to construct a discriminator model, which makes the discriminator model more accurate in judging the authenticity of input data; at the same time, the four-layer convolutional structure can better balance the feature extraction capability and the demand for computing resources without excessively increasing the model complexity. Attached Figure Description
[0062] Figure 1 Here is a diagram of the generator model structure;
[0063] Figure 2 Here is a diagram of the discriminator model structure;
[0064] Figure 3 This is a diagram illustrating the mapping process from convolutional feature maps to self-attention feature maps. Detailed Implementation
[0065] The method of the present invention will be further described below with reference to the accompanying drawings and embodiments.
[0066] Example 1:
[0067] This embodiment proposes an adversarial example generation method based on a self-attention mechanism, which mainly includes the following steps:
[0068] A generator model is constructed using a convolutional neural network. This model includes convolutional layers, self-attention layers, and deconvolutional layers. The convolutional layers encode the input image, capturing local and high-dimensional features. The self-attention layers capture long-range dependencies and high-dimensional features in the latent space. The deconvolutional layers reconstruct low-dimensional features, progressively enlarging the feature map to restore its spatial dimension. The generator model constructed in this embodiment learns the feature distribution of the original data, outputs adversarial perturbations, and constructs fake image samples. Compared to previous generator models, this model introduces a self-attention mechanism, which captures long-range dependencies and high-dimensional features in the latent space, improving the concealment of adversarial examples. The specific construction details and parameters are described below.
[0069] Generator model structure as follows Figure 1 As shown, the network comprises an input layer, five convolutional layers (Conv-2D 1, Conv-2D 2, Conv-2D 3, Conv-2D 4, and Conv-2D 5), a residual block, two self-attention layers (Self-Attention Layer 1 and Self-Attention Layer 2), and three deconvolutional layers (DeConv-2D 1, DeConv-2D 2, and DeConv-2D 3). The convolutional layers extract input features, and the self-attention layers capture long-range dependencies and high-dimensional features in the latent space, improving the realism of generated adversarial examples. The residual block helps alleviate the vanishing gradient problem and enhances the network's learning ability. Finally, the deconvolutional layers restore the image to its original spatial resolution.
[0070] Specifically, the mapping process from convolutional feature maps to self-attention feature maps is as follows: Figure 3 As shown, the convolutional feature map X = (x1, x2, x3, ..., x...) from the previous convolutional layer... n ), X∈R C×NHere, C and N are the number of channels and feature locations of the input feature map, respectively. First, a 1x1 convolution kernel is used to linearly transform the values of the C channels of each pixel, transforming the image features into two feature spaces f and g. Where f(x) = W f x, g(x) = W g x, the feature operation output of position i is transposed and multiplied with the feature space output of position j, and then normalized to obtain the attention weights between the two positions. The specific process of capturing global dependencies can be represented as:
[0071]
[0072] In the formula, β j,i This represents the degree of dependence of the model on the i-th position during the synthesis of the j-th region of the image within the self-attention framework. The attention map obtained after the above operations is multiplied by the feature map in the feature space h, and then processed by the weight matrix W. v The final output obtained from the calculation is O = (O1, O2, ..., O2). j ,...,O N ),in:
[0073]
[0074] W in the above formula h and W v The weight matrix is obtained through 1x1 convolution kernel operations and is continuously optimized during network training. To improve memory efficiency, a strategy of reducing the number of channels was adopted in the experiment, reducing the number of channels in the intermediate weight matrix to one-eighth of the original number of channels. Finally, O... j Multiplying this by a learnable scalar and then superimposing it on the original input feature map yields a feature map that incorporates attention:
[0075] y i =λO i +x i
[0076] Table 1 provides detailed information on parameters such as kernel size and stride for each convolutional layer and self-attention layer.
[0077] Table 1 Generator Model Parameter Structure
[0078]
[0079] A multi-convolutional discriminator model is constructed to determine the authenticity of original and synthetic samples. This embodiment's multi-convolutional discriminator model consists of four stacked convolutional layers. The probability of a sample being true is normalized by the sigmoid function in the last layer of the network. The construction and processing flow of each layer are described in detail below.
[0080] Multi-convolution discriminator model structure as follows Figure 2 As shown, the system includes an input layer, whose sources include data samples generated by the generator and raw data samples. It also includes four convolutional layers (Conv-2D 1, Conv-2D 2, Conv-2D 3, and Conv-2D 4), and uses batch normalization to accelerate convergence during deep network training and improve training stability. The activation function used is LeakyReLU to introduce non-linear features. Conv-2D 1 is mainly responsible for extracting basic features such as texture and edges from the input data; Conv-2D 2 further extracts more complex features such as shape information based on the basic features extracted in the first layer; after completing the basic feature extraction of the first two layers, Conv-2D 3 focuses on more complex and abstract feature information, such as the features of the eyes and nose in facial recognition, allowing the model to begin understanding certain specific content in the image; the final convolutional layer, Conv-2D 4, integrates all the extracted feature information to form a comprehensive understanding of the input data and determine its authenticity.
[0081] Table 2 provides information on parameters such as kernel size and stride for each layer of the multi-convolution discriminator model.
[0082] Table 2. Parameter Structure of the Multiconvolutional Discriminator Model
[0083]
[0084] The design incorporates a loss function, which guides the alternating optimization of the generator model and the multi-convolutional discriminator model using the training dataset. This adversarial training aims to achieve a performance balance. Once the generator model and the multi-convolutional discriminator model reach a dynamic equilibrium, the trained generator model is saved. The loss function in this embodiment includes: misjudgment loss, adversarial loss, and hinge loss. The misjudgment loss guides the generator model to train towards synthetic adversarial examples. The adversarial loss maintains the dynamic balance between the generator and discriminator performance. The hinge loss limits the magnitude of adversarial perturbations in the generator model's output.
[0085] The design of the loss function and the parameter settings during the training process will now be explained in detail.
[0086] The loss function includes the misjudgment loss L of the supervised generator. adv Hinge loss L due to constraint disturbance magnitude hingeAnd the overall adversarial loss L of GAN GAN ,:
[0087] L = L adv +αL GAN +βL hinge
[0088] Here, α and β are hyperparameters used to control the importance of different types of losses within the framework. The GAN model simultaneously focuses on multiple optimization objectives during training, including adversarial loss L. GAN The generator is encouraged to produce data with a distribution similar to the original data distribution; hinge loss L hinge Limit the noise level to stabilize GAN training; misclassification loss L adv This causes the generator to generate in a way that counteracts the perturbation.
[0089] The specific losses incurred in combat are expressed as follows:
[0090] L adv =E x λ f (x+G(x),t)
[0091] Where f is the target attack model, t is the true label of the input data x, and λ f This reflects the ability of the target attack model f to identify synthetic samples, specifically implemented using the cross-entropy loss function.
[0092] To obtain adversarial examples that are close to the original samples, a hinge loss with a norm is introduced during training to impose the following constraints on the perturbation magnitude:
[0093] L hinge =E x max(0,||G(x)||2-c)
[0094] Here, the constant c represents the optimization boundary of the perturbation, with a larger loss applied to pixels too far from the boundary. During different training stages, a dynamic balance must be maintained between the generator and discriminator performance. If the discriminator is too powerful, the generator will have difficulty fooling it and will be unable to produce sufficiently realistic samples; if the generator is too powerful, it will not provide effective and sufficient incentives for the discriminator's training. Their adversarial loss can be expressed as:
[0095] L GAN =E x logD(x)+E x log(1-D(x+G(x)))
[0096] The entire training process uses the Adam optimizer with a learning rate of 0.001 and a learning rate decay strategy. After 50 and 80 epochs, the learning rate is reset to 0.0001 and 0.00001, respectively. The batch size for all models is uniformly set to 64, and the maximum number of iterations is set to 100. Training terminates when performance on the validation set fails to improve within 10 consecutive epochs. The loss function uses α of 5 and β of 1 to stabilize model training and control the weight of the hinge loss. The perturbation of the hinge loss is achieved by setting parameter c to 0.1 to minimize the L2 norm.
[0097] Using a trained generator model, adversarial perturbations are generated from the original samples. These perturbations are then output through the model's forward propagation and superimposed on the original samples to obtain the adversarial examples. This is explained in detail below:
[0098] First, select original samples from the target network that can be correctly classified in a non-adversarial environment as the starting point. Next, load a pre-trained generator model, which has been adversarially trained to generate subtle, imperceptible perturbations based on the original samples. These perturbations are sufficient to mislead the target network into making incorrect decisions. Then, input the selected original samples into the generator model and output targeted adversarial perturbations.
[0099] Subsequently, the adversarial perturbation is superimposed onto the original samples to generate the final adversarial examples. During the superposition process, the perturbation needs to be constrained to a certain extent to ensure that the adversarial examples remain within the valid data range. After obtaining the adversarial examples, they are input into the target classification network for verification to test whether the network can correctly identify them. If the target network is successfully misled, it indicates that the attack is successful and the generator model achieves the expected performance.
[0100] Finally, the vulnerability of the target model to adversarial attacks is evaluated based on the test results, and further guidance is provided for the design and optimization of security protection measures for deep learning models to enhance the robustness of the network.
Claims
1. A method for generating adversarial samples based on a self-attention mechanism, characterized in that: Includes the following steps: A generator model is constructed, comprising an encoder and a decoder. The encoder comprises a first input layer, unit 1, unit 2, unit 3, and unit 4 connected in sequence. Unit 1 consists of a first convolutional layer, a first batch normalization layer, and a first Leaky ReLU layer; unit 2 consists of a second convolutional layer, a second batch normalization layer, and a second Leaky ReLU layer; unit 3 consists of a third convolutional layer, a third batch normalization layer, and a third Leaky ReLU layer; and unit 4 consists of a fourth convolutional layer, a first self-attention layer, a fourth batch normalization layer, and a fourth Leaky ReLU layer. The output of unit 1 is connected to the output of unit 3 via a residual connection module. The decoder comprises units 5, 6, 7, and 8 connected in sequence. Unit 5 consists of a fifth convolutional layer, a second self-attention layer, a fifth batch normalization layer, and a fifth Leaky ReLU layer; unit 6 consists of a first deconvolutional layer, a sixth batch normalization layer, and a sixth Leaky ReLU layer; and unit 7 consists of a second deconvolutional layer, a seventh batch normalization layer, and a seventh Leaky ReLU layer. Unit 7 is composed of the third deconvolutional layer, the seventh batch normalization layer, and the seventh LeakyReLU layer. A multi-convolutional discriminator model is constructed, comprising a second input layer, a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer, and a ninth convolutional layer connected in sequence; the second input layer is used to input the original data samples and the adversarial samples generated by the generator model; The loss function is designed, which consists of a misjudgment loss function, an adversarial loss function, and a hinge loss function. The misjudgment loss function is used to guide the generator model to train in the direction of adversarial examples. The adversarial loss function is used to maintain the dynamic balance between the performance of the generator and the discriminator. The hinge loss function is used to limit the magnitude of the adversarial perturbation in the output of the generator model. Guided by the loss function, the generator model and the multi-convolutional discriminator model are alternately optimized using the training dataset for adversarial training. Finally, after the performance of the generator model and the multi-convolutional discriminator model reaches a dynamic balance, the trained generator model is saved. The original image sample is input into the trained generator model to generate adversarial perturbations. The generated adversarial perturbations are then superimposed on the original image sample to obtain the adversarial image sample. The mapping process from the convolutional feature map output by the convolutional layer to the self-attention feature map output by the self-attention layer can be represented as follows: Convolutional feature map output by the convolutional layer , ,in, and These represent the number of channels and the number of feature locations in the input feature map, respectively. First, a 1x1 convolution kernel is used for each pixel. The values of each channel are linearly transformed to transform the image features into two feature spaces respectively. and Among them , The feature operation output at position i is transposed and multiplied by the feature space output at position j, then normalized to obtain the attention weights between the two positions, specifically expressed as follows: ; In the formula, This indicates that, within the self-attention framework, the model, at the [missing information]th ... The process of the first region The degree of dependence of each position; Indicates the first in the image The position and the first Weights between positions; The attention map is obtained through the above calculations and compared with the feature space. The feature maps in the matrix are multiplied together, and then processed by the weight matrix. The final output is obtained through calculation. ,in: ; In the formula, and The weight matrix is obtained through 1x1 convolution kernel operations. Indicates to Using the weight matrix The transformation operations performed express Through the weight matrix The transformed value; Finally With a learnable scalar Multiplying and superimposing the original input feature maps yields a feature map that incorporates attention: 。 2. The adversarial example generation method based on self-attention mechanism according to claim 1, characterized in that: The misjudgment loss function is expressed as follows: ; in, For the target attack model, For input data The true label, This reflects the target attack model. The ability to identify synthetic samples, Indicates input data Take the expected value. This represents the adversarial perturbation generated by the generator model.
3. The adversarial example generation method based on self-attention mechanism according to claim 1, characterized in that: The adversarial loss function is expressed as: ; in, Indicates input data Take the expected value. This represents the adversarial perturbation generated by the generator model. This represents the probability that the discriminator model judges a sample as true.
4. The adversarial example generation method based on self-attention mechanism according to claim 1, characterized in that: The hinge loss function is expressed as: ; Where, constant For the optimal boundary of the perturbation, This represents the adversarial perturbation generated by the generator model. Indicates resistance to disturbances Norm.
5. An adversarial example generation system based on a self-attention mechanism, characterized in that: include: The trained generator model is used to generate adversarial perturbations for the original image samples; The adversarial sample generation module is used to superimpose the generated adversarial perturbations with the original image samples to obtain adversarial image samples; The trained generator model is obtained according to the following steps: A generator model is constructed, comprising an encoder and a decoder. The encoder comprises a first input layer, unit 1, unit 2, unit 3, and unit 4 connected in sequence. Unit 1 consists of a first convolutional layer, a first batch normalization layer, and a first Leaky ReLU layer; unit 2 consists of a second convolutional layer, a second batch normalization layer, and a second Leaky ReLU layer; unit 3 consists of a third convolutional layer, a third batch normalization layer, and a third Leaky ReLU layer; and unit 4 consists of a fourth convolutional layer, a first self-attention layer, a fourth batch normalization layer, and a fourth Leaky ReLU layer. The output of unit 1 is connected to the output of unit 3 via a residual connection module. The decoder comprises units 5, 6, 7, and 8 connected in sequence. Unit 5 consists of a fifth convolutional layer, a second self-attention layer, a fifth batch normalization layer, and a fifth Leaky ReLU layer; unit 6 consists of a first deconvolutional layer, a sixth batch normalization layer, and a sixth Leaky ReLU layer; and unit 7 consists of a second deconvolutional layer, a seventh batch normalization layer, and a seventh Leaky ReLU layer. Unit 7 is composed of the third deconvolutional layer, the seventh batch normalization layer, and the seventh LeakyReLU layer. A multi-convolutional discriminator model is constructed, comprising a second input layer, a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer, and a ninth convolutional layer connected in sequence; the second input layer is used to input the original data samples and the adversarial samples generated by the generator model; The loss function is designed, which consists of a misjudgment loss function, an adversarial loss function, and a hinge loss function. The misjudgment loss function is used to guide the generator model to train in the direction of adversarial examples. The adversarial loss function is used to maintain the dynamic balance between the performance of the generator and the discriminator. The hinge loss function is used to limit the magnitude of the adversarial perturbation in the output of the generator model. Guided by the loss function, the generator model and the multi-convolutional discriminator model are alternately optimized using the training dataset for adversarial training. Finally, the generator model and the multi-convolutional discriminator model reach a dynamic balance in performance, and the well-trained generator model is obtained. The mapping process from the convolutional feature map output by the convolutional layer to the self-attention feature map output by the self-attention layer can be represented as follows: Convolutional feature map output by the convolutional layer , ,in, and These represent the number of channels and the number of feature locations in the input feature map, respectively. First, a 1x1 convolution kernel is used for each pixel. The values of each channel are linearly transformed to transform the image features into two feature spaces respectively. and Among them , The feature operation output at position i is transposed and multiplied by the feature space output at position j, then normalized to obtain the attention weights between the two positions, specifically expressed as follows: ; In the formula, This indicates that, within the self-attention framework, the model, at the [missing information]th ... The process of the first region The degree of dependence of each position; Indicates the first in the image The position and the first Weights between positions; The attention map is obtained through the above calculations and compared with the feature space. The feature maps in the matrix are multiplied together, and then processed by the weight matrix. The final output is obtained through calculation. ,in: ; In the formula, and The weight matrix is obtained through 1x1 convolution kernel operations. Indicates to Using the weight matrix The transformation operation performed express Through the weight matrix The transformed value; Finally With a learnable scalar Multiplying and superimposing the original input feature maps yields a feature map that incorporates attention: 。 6. The adversarial example generation system based on self-attention mechanism according to claim 5, characterized in that: The misjudgment loss function is expressed as follows: ; in, For the target attack model, For input data The true label, This reflects the target attack model. The ability to identify synthetic samples, Indicates input data Take the expected value. This represents the adversarial perturbation generated by the generator model.
7. The adversarial example generation system based on self-attention mechanism according to claim 5, characterized in that: The adversarial loss function is expressed as: ; in, Indicates input data Take the expected value. This represents the adversarial perturbation generated by the generator model. This represents the probability that the discriminator model judges a sample as true.
8. The adversarial example generation system based on self-attention mechanism according to claim 5, characterized in that: The hinge loss function is expressed as: ; Where, constant For the optimal boundary of the perturbation, This represents the adversarial perturbation generated by the generator model. Indicates resistance to disturbances Norm.