Variable facial feature-based clean-label backdoor attack method

By utilizing hairstyle and hair color as variable features in the face recognition model as backdoor triggers, and combining them with adversarial perturbations to generate poisoned samples, the problem of insufficient learning of backdoor triggers in existing technologies is solved, and a backdoor attack with high concealment and high success rate is achieved.

WO2026137214A1PCT designated stage Publication Date: 2026-07-02SHANGHAI CHENGDIAN FUZHI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SHANGHAI CHENGDIAN FUZHI TECH CO LTD
Filing Date
2024-12-25
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing clean-label backdoor attack methods suffer from insufficient feature learning of backdoor triggers during model training, resulting in low attack success rates. Furthermore, fixed-pattern triggers are easily detected, making it difficult to achieve effective backdoor attacks.

Method used

A clean-label backdoor attack method based on variable facial features is adopted. By modifying the hairstyle and hair color in the facial attributes as backdoor triggers, adversarial perturbations are combined to generate poisoned samples to train the backdoor model, ensuring that the triggers are highly correlated with the facial images, thereby improving the concealment and attack effectiveness.

Benefits of technology

It improves the concealment and effectiveness of backdoor attacks, ensures that triggers are difficult to detect, enhances the model's learning of trigger features, and improves the success rate and targeting of attacks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2024142080_02072026_PF_FP_ABST
    Figure CN2024142080_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed in the present invention is a variable facial feature-based clean-label backdoor attack method, comprising: acquiring a facial recognition dataset D1, a sub-dataset D2, and a classification neural network; training the classification neural network by means of D1 to obtain a facial recognition model; for each clean sample in the sub-dataset D2, generating an adversarial sample; embedding a backdoor trigger into the adversarial sample to generate a poisoned sample, wherein a class label of the poisoned sample is a class label of the clean sample corresponding to the poisoned sample, and all poisoned samples constitute a poisoned dataset D3; and combining D1 and D3 to train the facial recognition model to obtain a backdoored model, using any facial image to generate a poisoned sample for attack, and attacking the backdoored model. The present invention combines adversarial perturbations and facial attribute editing technology, does not depend on changing labels of training samples, but achieves implanting of a backdoor by modifying hairstyle and hair color, thereby improving the stealthiness of backdoor attacks, and also enhancing learning of trigger features by the model, thus improving the effectiveness of attacks.
Need to check novelty before this filing date? Find Prior Art

Description

Clean label backdoor attack method based on variable facial features Technical Field

[0001] This invention relates to the field of artificial intelligence security technology, and in particular to a clean tag backdoor attack method based on variable facial features. Background Technology

[0002] Facial recognition technology, as a crucial component of biometric identification, has experienced rapid development and widespread application in recent years. From security verification and identity recognition to personalized services, facial recognition technology plays a vital role in various fields such as finance, security, and social networking. However, with the deepening application of this technology, its security issues have gradually been exposed, especially backdoor attacks targeting deep learning models, which pose a serious threat to the security of facial recognition systems.

[0003] Backdoor attacks are a type of attack that injects specific data or structures during the training phase, causing the model to behave incorrectly under certain triggering conditions. In facial recognition systems, attackers may manipulate the model by adding samples with specific facial attributes to the training data or embedding specific hidden layers in the model structure. Once a backdoor is successfully injected into the model, the attacker can trigger it with a specific facial image, causing the model to output incorrect classification results, thereby compromising the security and reliability of the system. While existing backdoor attacks have improved the stealth of triggers, making poisoned samples almost indistinguishable from original samples, the labels of poisoned samples can still be modified. Backdoor defenses can still detect poisoned samples in the training set by examining the relationship between images and labels in the training samples. Therefore, clean-label backdoor attacks have emerged. These attacks do not require modifying the original labels of poisoned images during the training phase. Instead, they generate poisoned images by interfering with the target image (with the target label) and add triggers to the source image to generate a backdoor image that is misclassified. For example, Turner et al. proposed the Label-Consistent Backdoor Attack (LC), which perturbs the target image through generative models or adversarial perturbations to reduce the influence of robust features in the target image; see the paper Turner A, Tsipras D, Madry A. Label-consistent backdoor attacks[J]. arXiv preprintarXiv:1912.02771, 2019; Saha et al. proposed the Hidden Trigger Backdoor Attack (HTBA), which associates the backdoor trigger with the target label through feature perturbation; see the paper Saha A, Subramanya A, Pirsiavash H. Hidden trigger backdoor attacks[C]. Proceedings of the AAAI conference on artificial intelligence, 2020: 11957-11965. Although the perturbations generated by these attacks during the training phase can be considered sample-specific, the following problems still exist:

[0004] (1) The success rate of clean-label backdoor attacks is limited by their core strategy: not changing the label of the poisoned sample. This strategy causes the model to learn both the normal features of the target class and the features of the backdoor trigger during training. Since the salient features of the poisoned sample often prompt the model to classify it correctly, the model may ignore the features of the backdoor trigger, thus reducing the success rate of the attack.

[0005] (2) The model tends to learn robust features associated with the target label, which may suppress the features of the backdoor trigger. During model training, the dominant role of robust features makes it difficult for the neural network to effectively learn the pattern of the backdoor trigger, thus affecting the effectiveness of backdoor attacks.

[0006] (3) Because the model tends to capture features that have a greater impact on classification decisions during training, the features of backdoor triggers may not be fully learned and valued. This learning deficiency means that even if the triggers are correctly embedded in the image, the model may not be able to correctly identify and respond to these triggers during the inference phase, leading to the failure of the backdoor attack.

[0007] (4) Both of the above methods use fixed-pattern triggers, which are easily detected by victims. Summary of the Invention

[0008] The purpose of this invention is to provide a clean-label backdoor attack method based on variable facial features that solves the above problems, does not rely on changing the labels of training samples, and makes backdoor triggers difficult to detect.

[0009] To achieve the above objectives, the technical solution adopted by the present invention is as follows: a clean label backdoor attack method based on variable facial features, comprising the following steps;

[0010] S1, obtain face recognition dataset D1, sub-dataset D2 and classification neural network. The face recognition dataset D1 contains M categories, each category contains multiple clean samples, and the sub-dataset consists of multiple clean samples of the same category in D1.

[0011] S2, use D1 to train a classification neural network to obtain a face recognition model. The face recognition model takes clean samples as input and outputs the predicted category and predicted probability distribution of the clean samples.

[0012] S3, For each clean sample in the subset D2, generate an adversarial sample using the PGD algorithm;

[0013] S4, select a reference image, and use the hairstyle and hair color in it as a backdoor trigger. Embed the backdoor trigger in each adversarial sample to obtain a poisoned sample. The category label of the poisoned sample is the category label of its corresponding clean sample, and all poisoned samples constitute the poisoned dataset D3.

[0014] S5, the face recognition dataset D1 and the poisoned dataset D3 are combined to form the backdoor dataset D. The face recognition model is trained using the backdoor dataset D to obtain the backdoor model. The input of the backdoor model is the sample in the backdoor dataset D, and the output is the predicted category and predicted probability distribution of the sample.

[0015] S6, attack backdoor model;

[0016] Select any face image, generate adversarial samples by pressing S3, generate poisoned samples to be attacked by pressing S4, input them into the backdoor model, and output their predicted category and predicted probability distribution.

[0017] As a preferred option: multiple clean samples from one category in D1, which are frontal images of the same person with different expressions.

[0018] Preferably, the face recognition dataset is the CelebA face dataset or the MS-Celeb-1M face recognition dataset, and the classification neural network is a VGG network, an AlexNet network, or a ResNet network.

[0019] As a preferred option: In S2, the loss function of the classification neural network adopts the cross-entropy loss function L. CE It is obtained from the following formula;

[0020] ;

[0021] In the formula, N is the total number of clean samples in a batch fed into the classification neural network, where the i-th clean sample is labeled as sample i, c is category c among the M categories, and y ic Let y be a sign function; if the true class of sample i is c, then y ic =1, otherwise y ic =0, p ic This represents the predicted probability that sample i belongs to category c in the classification neural network.

[0022] As a preferred option: In S4, the backdoor trigger is specifically embedded in the adversarial example as follows;

[0023] Sa1 uses the facial feature predictor in the Dlib library to detect key points from adversarial examples, aligns the key points to a preset position or shape, and then crops the adversarial examples to a preset size to obtain the first sample.

[0024] Sa2: Input the first sample into the HairCLIPv2 model, modify the hairstyle and hair color in the first sample according to the hairstyle and hair color in the reference image, and obtain the poisoned sample.

[0025] As a preferred option, when training the backdoor model using S5, the cross-entropy loss function should also be used.

[0026] Compared with the prior art, the advantages of the present invention are as follows:

[0027] (1) This invention proposes a new clean label backdoor attack method based on variable facial features. This method does not rely on changing the labels of training samples, but instead implants backdoors by modifying advanced and complex features such as hairstyle and hair color in facial attributes. This design utilizes the natural changes in facial attributes, improves the concealment of backdoor attacks, and makes the triggers difficult to detect visually, thereby achieving backdoor attacks without arousing suspicion.

[0028] (2) This method combines adversarial perturbation and facial attribute editing techniques. This combination not only improves the concealment of backdoor attacks but also enhances the model's learning of trigger features, thereby increasing the effectiveness of the attack. The proposed method not only provides a new perspective and tool for the security research of face recognition systems but also presents new challenges and research directions for the defense against backdoor attacks.

[0029] (3) The backdoor trigger for each sample is customized based on the sample itself, ensuring a high correlation between the trigger and the face image, which increases the concealment and targeting of the backdoor attack. Attached Figure Description

[0030] Figure 1 is a flowchart of the present invention;

[0031] Figure 2 is a schematic diagram of the process of Embodiment 2 of the present invention. Detailed Implementation

[0032] The invention will now be further described with reference to the accompanying drawings.

[0033] Example 1: Referring to Figures 1 and 2, a clean label backdoor attack method based on variable facial features includes the following steps;

[0034] S1, obtain face recognition dataset D1, sub-dataset D2 and classification neural network. The face recognition dataset D1 contains M categories, each category contains multiple clean samples, and the sub-dataset consists of multiple clean samples of the same category in D1.

[0035] S2, use D1 to train a classification neural network to obtain a face recognition model. The face recognition model takes clean samples as input and outputs the predicted category and predicted probability distribution of the clean samples.

[0036] S3, For each clean sample in the subset D2, generate an adversarial sample using the PGD algorithm;

[0037] S4, select a reference image, and use the hairstyle and hair color in it as a backdoor trigger. Embed the backdoor trigger in each adversarial sample to obtain a poisoned sample. The category label of the poisoned sample is the category label of its corresponding clean sample, and all poisoned samples constitute the poisoned dataset D3.

[0038] S5, the face recognition dataset D1 and the poisoned dataset D3 are combined to form the backdoor dataset D. The face recognition model is trained using the backdoor dataset D to obtain the backdoor model. The input of the backdoor model is the sample in the backdoor dataset D, and the output is the predicted category and predicted probability distribution of the sample.

[0039] S6, attack backdoor model;

[0040] Select any face image, generate adversarial samples by pressing S3, generate poisoned samples to be attacked by pressing S4, input them into the backdoor model, and output their predicted category and predicted probability distribution.

[0041] In this embodiment: D1 contains multiple clean samples of one category, which are frontal images of the same person with different expressions. The face recognition dataset is either the CelebA face dataset or the MS-Celeb-1M face recognition dataset, and the classification neural network is a VGG network, an AlexNet network, or a ResNet network.

[0042] In S2, the loss function of the classification neural network is the cross-entropy loss function L. CE It is obtained from the following formula;

[0043] ;

[0044] In the formula, N is the total number of clean samples in a batch fed into the classification neural network, where the i-th clean sample is labeled as sample i, c is category c among the M categories, and y ic Let y be a sign function; if the true class of sample i is c, then y ic =1, otherwise y ic =0, p ic This represents the predicted probability that sample i belongs to category c in the classification neural network.

[0045] In S4, the specific steps for embedding a backdoor trigger in the adversarial example are as follows:

[0046] Sa1 uses the facial feature predictor in the Dlib library to detect key points from adversarial examples, aligns the key points to a preset position or shape, and then crops the adversarial examples to a preset size to obtain the first sample.

[0047] Sa2: Input the first sample into the HairCLIPv2 model, modify the hairstyle and hair color in the first sample according to the hairstyle and hair color in the reference image, and obtain the poisoned sample.

[0048] When training the backdoor model using S5, the cross-entropy loss function is also used.

[0049] Example 2: Referring to Figures 1 and 2, we will describe the method of the present invention in conjunction with Figure 2 for clearer illustration. Figure 2 is divided into three parts: left, middle, and right, corresponding to the attack phase, training phase, and inference phase, respectively.

[0050] The left side represents the attack phase, which refers to the process of generating virus samples from clean samples. As shown in the figure, a clean sample is a female facial image with a class label of 0. We select several facial images of this woman to form a subset D2. For each image in D2, infected samples are generated through S3 and S4. The class label of the infected samples is still 0.

[0051] The middle section represents the training phase, which includes training a classification network using the face recognition dataset D1 to obtain a face recognition model, and training a face recognition model using the backdoor dataset D to obtain a backdoor model. As shown in Figure 2, the face recognition dataset D1 contains clean samples from different people, with each person corresponding to a different category. The backdoor dataset D includes clean samples and poisoned samples.

[0052] The right side represents the inference stage, where two different male face images are selected. The class labels for these two men are assumed to be 1 and 2, respectively. The poisoned samples to be attacked are generated through S3 and S4. After being input into the backdoor model, they will be misidentified as label 0, which means they will be misidentified as the class of the subset D2, indicating that the backdoor attack is successful.

[0053] Example 3: Referring to Figures 1 and 2, based on Example 1, we provide a more detailed implementation process including the following steps;

[0054] S1, select face recognition dataset D1, subset D2, and classification neural network;

[0055] Regarding the face recognition dataset D1, it should contain diverse face samples to ensure the generalization ability of backdoor attack methods. This dataset is divided into an original training set and an original test set. The original training set is used to train the classification neural network to obtain the face recognition model, and the original test set is used to evaluate the performance of the face recognition model on unexplored data. Regarding the subset D2, it consists of clean samples of the same category selected from D1. If one category corresponds to one person, multiple clean samples of the same person are selected. The classification neural network includes, but is not limited to, common deep learning models such as VGG networks, AlexNet networks, or ResNet.

[0056] S2, use D1 to train the classification neural network to obtain the face recognition model, specifically including steps S21~S26;

[0057] S21, Select the original training set in S1 as the training data;

[0058] S22, Training parameter configuration: Set the parameters required for training, including but not limited to learning rate, batch size, epochs, etc., to provide necessary parameter support for training the classification neural network;

[0059] S23, Choose the cross-entropy loss function L CE As the loss function of a classification neural network, ;

[0060] S24, Initialize the parameters of the classification neural network, including weights and biases, to prepare for the training process;

[0061] S25, in each iteration epoch, the original training set is input into the classification neural network in batches according to the batch size, the loss function value of each batch is calculated, and the parameters of the classification neural network are updated through backpropagation;

[0062] S26. Repeat step S25 until the iteration round is reached to obtain the face recognition model.

[0063] S3. For each clean sample in the subset D2, generate an adversarial sample using the PGD algorithm. The purpose of this step is to introduce perturbations into the clean samples. These perturbations are expected to be almost imperceptible to the eyes, but can significantly affect the decision-making process of the model, creating conditions for the implantation of backdoor triggers. Therefore, this embodiment gives the specific steps S31 to S34 for generating adversarial samples.

[0064] S31, Disturbance amplitude setting;

[0065] First, the range of the perturbation needs to be determined. It should avoid attracting visual attention as much as possible, but still have an impact on the model's decision-making process. We used the PGD algorithm to add perturbations with a range of [-16, 16] and [-32, 32]. It can be seen that when the perturbation is small, the difference between the generated adversarial sample and the original image is difficult to be distinguished by the naked eye. As the perturbation gradually increases, the difference between the adversarial sample and the original image will become larger and larger. In order to make the backdoor attack of this invention have good concealment, this embodiment selects the range of [-16, 16].

[0066] S32, Select the perturbation algorithm;

[0067] Choose the adversarial attack algorithm: PGD algorithm, which can effectively find and exploit the weaknesses of the model while keeping the appearance of the image unchanged, and interfere with the original features of the image.

[0068] S33, iteratively generate adversarial examples;

[0069] The PGD algorithm iteratively processes clean samples. In each iteration, based on the current gradient information, the perturbations on the adversarial samples are fine-tuned to maximize their impact on the model's output. These perturbations can include adding noise, adjusting pixel values, or other transformations. The magnitude of the perturbation is controlled by pre-defined constraints. The perturbated adversarial samples are used as input for the next iteration, and the iteration process continues until the pre-defined number of iterations is reached, ultimately yielding the adversarial samples. During the multiple iterations of the PGD algorithm, the input data can be gradually altered to mislead the target model. Regarding the loss function during iteration: a loss function is defined, typically the cross-entropy loss function, with the difference between the model's predicted output and the true label as the optimization objective. By optimizing the loss function, the model is guided to generate prediction bias on the perturbated samples. Regarding the number of iterations: an appropriate number of iterations needs to be set to ensure that the generation of perturbations is both sufficient and efficient. Too many iterations may lead to overfitting, while too few iterations may fail to achieve the goal of destroying features.

[0070] S34, Evaluation of the perturbation effect

[0071] After generating adversarial examples, evaluate their impact on model predictions. This can be done by testing the model's accuracy on the perturbated examples, ensuring that the perturbation effectively disrupts the model's original feature learning.

[0072] S4, poisoned samples are generated from adversarial samples and constitute the poisoned dataset D3, which includes S41~S45;

[0073] S41, Select Editing Tool: Select HairCLIPv2 as the face attribute editing tool. This tool can generate corresponding hairstyles based on text descriptions or reference images.

[0074] S42, Set a reference image, which is a face image containing the hairstyle and hair color that will serve as a backdoor trigger;

[0075] S43, using the facial feature predictor in the Dlib library, detect key points from the adversarial examples, align the key points to a preset position or shape, and then crop the adversarial examples to a preset size to fit the input requirements of the HairCLIPv2 model, ensuring the smooth progress of the editing process. This step yields the first sample.

[0076] S44. Input the first sample into the HairCLIPv2 model. Modify the hairstyle and hair color in the first sample according to the hairstyle and hair color in the reference image to generate a poisoned sample. The poisoned sample corresponds one-to-one with the clean sample. Therefore, the category label of the corresponding clean sample is used as the category label of the poisoned sample.

[0077] S45, all poisoned samples constitute the poisoning dataset D3;

[0078] S5. Combine the face recognition dataset D1 and the poisoned dataset D3 to form the backdoor dataset D. Use the backdoor dataset D to train the face recognition model to obtain the backdoor model. The training process and method are the same as step S3 in this embodiment.

[0079] S6, attack backdoor model; select any clean sample from D1, generate adversarial sample by pressing S3, generate poisoned sample by pressing S4, and input it into the backdoor model.

[0080] Example 4: To illustrate the effectiveness of the present invention, a comparative experiment was conducted using two existing attack methods and the present invention, as follows:

[0081] Method 1: LC (Label-Consistent Backdoor Attack). See the paper Turner A, Tsipras D, Madry A. Label-consistent backdoor attacks[J]. arXiv preprintarXiv:1912.02771, 2019.

[0082] Method 2: ISSBA (Invisible backdoor attack with sample-specific triggers), see the paper Li Y, Wu B, et al. Invisible backdoor attack with sample-specific triggers[C]. Proceedings of the IEEE / CVF international conference on computer vision, 2021: 16463-16472.

[0083] Method 3: The method of the present invention.

[0084] Using the three methods described above, and under the same settings, face recognition models and backdoor models were obtained. Specifically, two datasets, CeleA and MS-Celeb-1M, and two classification neural networks, VGG and AlexNet, were selected. When training the face recognition models, the two datasets were used on the two classification neural networks, resulting in 12 face recognition models, also known as clean models, as shown in Table 1. When executing the backdoor attack, the attack target labels of both datasets were set to 0, and the data poisoning rate was 1%. Images with labels of 0 from both datasets were selected to generate poisoned images, and their labels were also set to 0. These were then added to the training set. The backdoor was implanted after training each face recognition model using this training set, resulting in the corresponding backdoor model.

[0085] In terms of evaluation metrics, benign accuracy (BA) and attack success rate (ASR) were chosen to measure the effectiveness of the attack, as shown in Table 1. A high BA indicates strong attack concealment, because large changes in the BA value will be detected by the user. A high ASR indicates effective attack. The concealment of the trigger is mainly reflected by two metrics: peak signal-to-noise ratio (PSNR) and L∞, as shown in Table 2. A higher PSNR value indicates that the poisoned image is closer to the original image, and the stronger the trigger concealment. A lower L∞ value indicates that the difference between the poisoned image and the original image is smaller, and the higher the steganography security.

[0086] Table 1. Comparison of the effects of different backdoor attack methods

[0087]

[0088] Table 2. Stealth Indicators of Clean Label Backdoor Attacks

[0089]

[0090] As shown in Table 1, the backdoor attack method proposed in this invention can achieve a high attack success rate of over 90% by poisoning 1% of the images in the training set while maintaining the accuracy of the model on clean samples. In contrast, the comparative methods LC and ISSBA both achieved extremely low attack success rates on both datasets, which are far lower than the backdoor attack method proposed in this invention, thus demonstrating the effectiveness of the proposed method.

[0091] As shown in Table 2, the PSNR and L∞ values ​​of the method of the present invention are between LC and ISSBA, indicating that the method of the present invention exhibits higher concealment than LC. Compared with the ISSBA method, the backdoor attack method of the present invention has weaker concealment, but the success rate of the ISSBA method for clean label backdoor attacks tends to 0.

[0092] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A clean-label backdoor attack method based on variable facial features, characterized in that: Includes the following steps; S1, obtain face recognition dataset D1, sub-dataset D2 and classification neural network. The face recognition dataset D1 contains M categories, each category contains multiple clean samples, and the sub-dataset consists of multiple clean samples of the same category in D1. S2, use D1 to train a classification neural network to obtain a face recognition model. The face recognition model takes clean samples as input and outputs the predicted category and predicted probability distribution of the clean samples. S3, For each clean sample in the subset D2, generate an adversarial sample using the PGD algorithm; S4, select a reference image, and use the hairstyle and hair color in it as a backdoor trigger. Embed the backdoor trigger in each adversarial sample to obtain a poisoned sample. The category label of the poisoned sample is the category label of its corresponding clean sample, and all poisoned samples constitute the poisoned dataset D3. S5, the face recognition dataset D1 and the poisoned dataset D3 are combined to form the backdoor dataset D. The face recognition model is trained using the backdoor dataset D to obtain the backdoor model. The input of the backdoor model is the sample in the backdoor dataset D, and the output is the predicted category and predicted probability distribution of the sample. S6, attack backdoor model; Select any face image, generate adversarial samples by pressing S3, generate poisoned samples to be attacked by pressing S4, input them into the backdoor model, and output their predicted category and predicted probability distribution.

2. The clean-label backdoor attack method based on variable facial features according to claim 1, characterized in that: D1 consists of multiple clean samples of one category, which are frontal images of the same person with different expressions.

3. The clean-label backdoor attack method based on variable facial features according to claim 1, characterized in that: The face recognition dataset is either the CelebA face dataset or the MS-Celeb-1M face recognition dataset, and the classification neural network is either a VGG network, an AlexNet network, or a ResNet network.

4. The clean-label backdoor attack method based on variable facial features according to claim 1, characterized in that: In S2, the loss function of the classification neural network is the cross-entropy loss function L. CE It is obtained from the following formula; ; In the formula, N is the total number of clean samples in a batch fed into the classification neural network, where the i-th clean sample is labeled as sample i, c is category c among the M categories, and y ic Let y be a sign function; if the true class of sample i is c, then y ic =1, otherwise y ic =0, p ic This represents the predicted probability that sample i belongs to category c in the classification neural network.

5. The clean-label backdoor attack method based on variable facial features according to claim 1, characterized in that: In S4, the specific steps for embedding a backdoor trigger in the adversarial example are as follows: Sa1 uses the facial feature predictor in the Dlib library to detect key points from adversarial examples, aligns the key points to a preset position or shape, and then crops the adversarial examples to a preset size to obtain the first sample. Sa2: Input the first sample into the HairCLIPv2 model, modify the hairstyle and hair color in the first sample according to the hairstyle and hair color in the reference image, and obtain the poisoned sample.

6. The clean-label backdoor attack method based on variable facial features according to claim 1, characterized in that: When training the backdoor model using S5, the cross-entropy loss function is also used.