Discrete cosine transform-based target detection backdoor attack defense method and system

By performing discrete cosine transform preprocessing on the training samples of the object detection model and retraining on a clean dataset, the problem of the object detection model being vulnerable to backdoor attacks is solved, and the security and reliability of the model are improved while ensuring detection accuracy.

CN122241692APending Publication Date: 2026-06-19HANGZHOU DIANZI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU DIANZI UNIV
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing object detection models are vulnerable to backdoor attacks, which can lead to interference or damage in practical applications. Existing defense methods are difficult to apply effectively to object detection tasks, and pruning and output distribution-based defense methods are unstable.

Method used

By performing frequency domain preprocessing with discrete cosine transform on the training samples, the backdoor triggering features are weakened, and the model is retrained with a clean dataset to restore model performance, ensuring the accuracy and security of target detection.

Benefits of technology

It effectively weakens the impact of backdoor triggering features, improves the security and reliability of the model in practical applications, has a significant defense effect, and results in minimal loss of model performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241692A_ABST
    Figure CN122241692A_ABST
Patent Text Reader

Abstract

This invention discloses a target detection backdoor attack defense method and system based on discrete cosine transform. The specific steps of the method are as follows: Step (1), obtain the training dataset from the public dataset and perform preprocessing operations on all training samples in the training dataset; Step (2), train the target detection model using the preprocessed training dataset; Step (3), use a clean dataset that has not been attacked to perform secondary training on the model, so that the model refocuses on the key features missing in step (1) and mines the information contained in the clean dataset. This invention can effectively weaken the influence of backdoor triggering features while ensuring the original detection performance of the target detection model, and improve the security and reliability of the target detection model in practical application scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence security technology, specifically relating to a target detection backdoor attack defense method and system based on discrete cosine transform. Background Technology

[0002] Object detection, a fundamental task in Computer Vision (CV), has been widely deployed in various practical applications such as autonomous driving, smart healthcare, and smart cities. It is used to predict specific categories of target objects in input samples (images or video frames) and to locate the predicted targets. Current mainstream object detection models are built on Deep Neural Networks (DNNs). However, DNNs are vulnerable to backdoor attacks. When attackers implement backdoor attacks on object detection models, they can directly or indirectly damage the models in practical applications, interfering with or even destroying the entire application system. Therefore, researching defense methods against object detection backdoor attacks is both necessary and urgent.

[0003] In recent years, backdoor attacks on DNN models have become increasingly serious, prompting researchers to propose various methods for defense. These methods aim to ensure that DNN models, even those trained using third-party data or platforms, can avoid backdoor attack threats in practical applications. However, most current defense methods are based on image classification tasks. Due to significant differences in structure, function, and data processing between object detection models and image classification models, most existing defense methods are difficult to directly apply to object detection tasks.

[0004] Deep learning-based object detection models typically employ highly complex network structures, integrating multiple sub-modules with distinct functions. These sub-modules intertwine and work collaboratively to complete complex object detection tasks. However, this complexity presents a significant challenge to defending against backdoor attacks. Taking pruning defense methods as an example, these methods usually weaken the backdoor effect by removing weights that contribute less to the output. However, due to the complex parameter relationships between sub-modules in object detection models, it is difficult to accurately assess the impact of individual weights on overall detection performance. This makes it difficult for pruning operations to effectively remove backdoor-related parameters while maintaining detection accuracy, resulting in unstable defense effectiveness.

[0005] Furthermore, object detection tasks output a list of information covering multiple target objects, with each object typically including category, location, and confidence level. Defense methods similar to STRIP randomize the output by adding noise to the input samples and then determine the presence of a backdoor attack based on the distribution of the output. However, due to the complexity of the output format in object detection tasks, the aforementioned defense methods based on output distribution statistics are difficult to directly apply in object detection scenarios, thus limiting their effectiveness. Summary of the Invention

[0006] To address the shortcomings of existing effective defense methods against backdoor attacks in target detection, this invention provides a target detection backdoor attack defense method and system based on discrete cosine transform. This method can effectively weaken the influence of backdoor triggering features while ensuring the original detection performance of the target detection model, thereby improving the security and reliability of the target detection model in practical application scenarios.

[0007] The features of this invention are: (1) By performing frequency domain preprocessing based on Discrete Cosine Transform (DCT) on the target detection training samples, the backdoor triggering features that may be contained in the samples are weakened. (2) By pruning the frequency domain coefficients and performing inverse discrete cosine transform, the semantic information of the target is preserved while suppressing the backdoor triggering features, thus ensuring the accuracy of target detection. (3) By using a clean dataset to retrain the preprocessed model, the model performance is restored, and the stability and reliability of the model in practical applications are improved.

[0008] To achieve the above objectives, the technical solution of the present invention is as follows:

[0009] A target detection backdoor attack defense method based on discrete cosine transform, the specific steps of which are as follows:

[0010] Step (1): Obtain the training dataset from the public dataset and perform preprocessing operations on all training samples in the training dataset;

[0011] Step (2): Train the object detection model using the preprocessed training dataset;

[0012] Step (3) uses a clean dataset that has not been attacked to conduct a second training of the model, so that the model refocuses on the key features that were missing before (i.e. before training the model with the preprocessed dataset in step (2)) (including high-frequency texture and edge detail features, local contrast and small target features, color accuracy loss introduced by color space conversion, contextual features at the boundaries of image blocks, weak semantic features in low signal-to-noise ratio areas, etc.), and mines the information contained in the clean dataset (including features lost in the preprocessing stage, and complete semantic and geometric information that the target detection task itself needs to learn by deep learning).

[0013] Preferably, step (1) performs preprocessing operations on all training samples in the training dataset, as follows:

[0014] (1-1) Convert the RGB format training samples in the original training dataset to YUV format sample images; preferably, the color space conversion can be performed according to the ITU-R BT.601 standard.

[0015] (1-2) Divide the YUV image obtained after color space conversion into... A number of image blocks, denoted as Where i and j are the two-dimensional index coordinates of the image patch;

[0016] (1-3) Divide each block obtained from the block division Perform DCT transformation and normalize the DCT coefficients to ensure that coefficients of different frequencies have the same order of magnitude during the transformation.

[0017] (1-4) DCT coefficient sequence obtained for each block based on the denoising coefficient β Perform noise reduction. For a size of... The block, according to the denoising coefficient β, retains its The DCT coefficients are set to 0, and the remaining coefficients are set to 0; where u and v are two-dimensional frequency indices of the frequency domain coefficients after DCT transformation, used to identify the spatial frequency component corresponding to each DCT coefficient; u represents the horizontal direction (row direction), and v represents the vertical direction (column direction).

[0018] (1-5) Denoise the DCT coefficient sequences of each block Perform an inverse transform to obtain the image of each block;

[0019] (1-6) The R×R blocks obtained after DCT inverse transformation are merged and stitched together according to their positions before segmentation to form a preprocessed YUV image;

[0020] (1-7) Convert the YUV image obtained by stitching to a color space to obtain a sample image in RGB format.

[0021] Preferably, step (2) model training: the model is trained using the preprocessed training dataset, and the specific training process is as follows:

[0022] The preprocessed training dataset is input into the object detection model. Preprocessing only changes the image pixel values ​​without changing the spatial location information of the target. The bounding box coordinates and class labels in the original annotation file can be directly reused.

[0023] The model is initialized using backbone network weights pre-trained on the ImageNet dataset to accelerate model convergence through transfer learning.

[0024] The preprocessed images are input into the network in batches for forward propagation, and the bounding box coordinates, confidence scores and class probabilities of each target are output.

[0025] The multi-task loss function is calculated based on the network output and the annotation file. The multi-task loss function includes bounding box regression loss, target confidence loss and classification loss.

[0026] The network parameters are iteratively updated using a stochastic gradient descent optimizer combined with a cosine annealing learning rate strategy until the model converges, resulting in a preliminarily trained target detection model.

[0027] Preferably, step (3) model retraining: After the initial training of the model is completed, a clean dataset that has not been poisoned by attackers is used to conduct a second training of the model, so that the model refocuses on the key features that were previously missing and deeply mines the rich information contained in the clean dataset. The specific second training process is as follows:

[0028] Using the model weights initially trained in step (2) as the starting parameters, load the clean dataset that has not been poisoned; the clean dataset is the original RGB format image that has not been preprocessed, and fully retains the high-frequency edges, local textures and spatial continuity and other feature information that were filtered out in the preprocessing stage;

[0029] The clean dataset is input into the model for forward propagation. The multi-task loss function is calculated in the same way as in step (2). The network parameters are fine-tuned iteratively updated using a learning rate much smaller than that in step (2). The smaller learning rate ensures that the model can recover the learning ability of the filtered features without destroying the insensitivity to trigger features established in step (2).

[0030] Secondary training continues until the model's detection performance (mAP) on a clean validation set converges, resulting in the final defense model.

[0031] Preferably, step (4) is performed after step (3), model testing: benign samples and poisoned samples are used for testing in sequence. First, benign samples are used to determine the baseline for normal output of the model, and then the model is observed to see if it produces specific abnormal output when poisoned samples are input, so as to evaluate the backdoor attack defense effect.

[0032] This invention also discloses a target detection backdoor attack defense system based on discrete cosine transform, used to execute the above method, comprising the following modules:

[0033] Data preprocessing module: Obtains the training dataset from the public dataset and performs preprocessing operations on all training samples in the training dataset;

[0034] Model training module: Trains the object detection model using a pre-processed training dataset;

[0035] Model retraining module: Using a clean dataset that has not been attacked, the model is retrained to refocus on the key features that were missing before preprocessing and to extract the information contained in the clean dataset.

[0036] Compared with the prior art, the present invention has the following significant advantages:

[0037] This invention proposes for the first time a target detection backdoor attack defense technology based on discrete cosine transform. This invention has excellent defense performance and can effectively defend against a variety of existing invisible backdoor attack methods, including FIBA, WaNet, and BppAttack, providing a solid and reliable security guarantee for the model. At the same time, this invention can ensure that the original performance and effect of the target detection model are not compromised. Attached Figure Description

[0038] Figure 1 The above is a flowchart of a target detection backdoor attack defense method based on discrete cosine transform, which is a preferred embodiment of the present invention.

[0039] Figure 2 This is a flowchart of data preprocessing in a target detection backdoor attack defense method based on discrete cosine transform, which is a preferred embodiment of the present invention.

[0040] Figure 3 The diagram shows the effect of a target detection backdoor attack defense method based on discrete cosine transform according to a preferred embodiment of the present invention. Figure 4 This is a diagram illustrating the effect of the data preprocessing process in a target detection backdoor attack defense method based on discrete cosine transform, which is a preferred embodiment of the present invention.

[0041] Figure 5 This is a block diagram of a target detection backdoor attack defense system based on discrete cosine transform, which is a preferred embodiment of the present invention. Detailed Implementation

[0042] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

[0043] like Figure 1 , 3As shown in the figure, this embodiment presents a target detection backdoor attack defense method based on discrete cosine transform. First, the training data is preprocessed, and the model is trained using the preprocessed data. After the initial training of the model, it is retrained using a clean dataset that has not been poisoned by the attacker. Finally, the model is tested using benign and poisoned samples. The specific steps of this embodiment are described below:

[0044] Step (1) Obtain the training dataset from public datasets on third-party platforms (such as Kaggle, COCOdataset, etc.), and preprocess all training samples in the training dataset. The specific process is as follows: Figure 2 , 4 As shown:

[0045] (1-1) The RGB format samples in the original training dataset are converted to YUV format according to the ITU-R BT.601 standard. The calculation process is shown in Equations (1), (2), and (3):

[0046]

[0047] In the YUV color space, Y is the luminance component, which accurately reflects the brightness of an image and encompasses a large amount of visual information. U and V are the chrominance components; U captures the difference between blue and luminance, while V represents the deviation of red from luminance. Together, they determine the image's hue, saturation, and other color information.

[0048] (1-2) Divide the YUV image obtained after color space conversion into... A number of image blocks, denoted as This reduces the computational complexity of DCT transform, allowing for parallel processing and significantly improving computational efficiency. Furthermore, DCT exhibits energy concentration characteristics; after image segmentation, energy is concentrated in a few low-frequency coefficients. This facilitates limiting denoising errors to a single block during subsequent denoising, enhancing fault tolerance and ensuring that processed samples can still effectively train the model.

[0049] (1-3) Divide each block obtained from the block division The DCT transformation is performed, and the calculation formula is shown in equation (4):

[0050]

[0051] in, It is the image of each block. These are the corresponding coefficient sequences, where N and M are the sizes of each block. and It is the scaling factor, as shown in equations (5) and (6):

[0052]

[0053] The DCT coefficients are normalized to ensure that coefficients at different frequencies have the same order of magnitude during the transformation.

[0054] (1-4) DCT coefficient sequence obtained for each block based on the denoising coefficient β Perform noise reduction. For a size of... The block, according to the denoising coefficient β, retains its The DCT coefficients are calculated, and the remaining coefficients are set to 0. The calculation process is shown in equation (7):

[0055]

[0056] (1-5) Denoise the DCT coefficient sequences of each block Perform an inverse transform to obtain the image of each block;

[0057] (1-6) The R×R blocks obtained after DCT inverse transformation are merged and stitched together according to their positions before segmentation to form a preprocessed YUV image;

[0058] (1-7) Convert the stitched YUV image to RGB format to obtain a sample image. The same standard is used as in step (1-1) to convert the YUV format sample image to RGB format. The calculation process is shown in equations (9), (10), and (11):

[0059]

[0060] Step (2) Model Training: Train the existing YOLOv5 model using the preprocessed training dataset. The preprocessed dataset cleverly preserves various features from the original dataset, ensuring the model doesn't miss key information due to data preprocessing. Furthermore, the denoising stage during preprocessing filters out some non-semantic information irrelevant to semantics, effectively preventing the model from learning potential trigger features and enhancing its resistance to potential backdoor attacks. The specific training process is as follows:

[0061] The preprocessed training dataset is input into the well-known YOLOv5 model. First, the input image is scaled and tensorized to meet the model's input size requirements. Then, the input image is sequentially passed through the backbone network of the YOLOv5 model for feature extraction. Multi-scale feature information in the image is extracted through convolutional layers and residual structures. The extracted features are further input into the neck network, where features of different scales are fused through a feature pyramid structure to enhance the model's ability to detect targets of different sizes. Finally, the fused features are input into the head, which outputs the target category, bounding box position, and target confidence information.

[0062] During training, a loss function is calculated based on the model output and the real annotation information. The loss function includes bounding box regression loss, target confidence loss and classification loss. The calculation process is as shown in Equation (12). Based on the loss function, the gradient of the model parameters is calculated by the backpropagation algorithm, and the model parameters are iteratively updated by the well-known stochastic gradient descent optimizer until the preset training rounds or loss convergence conditions are reached, and the target detection model after preliminary training is obtained.

[0063]

[0064] in, For loss function, For bounding box regression loss, For target confidence loss, For classifying losses, These are the weighting coefficients for each loss term.

[0065] Step (3) Model Retraining: After the initial training of the YOLOv5 model, a clean dataset that has not been poisoned by attackers is used to conduct a second training of the YOLOv5 model. This allows the model to refocus on the previously missing key features, deeply mine the rich information contained in the clean dataset, eliminate the negative impact of missing key features on model performance, and gradually improve model performance. During the second training process, clean data is input into the model and trained in the same way as in step (2) to obtain the final defense model.

[0066] Step (4) Model testing: Test the model using benign samples and poisoned samples in sequence. First, use benign samples to determine the baseline for normal model output, and then observe whether the model exhibits specific abnormal output when poisoned samples are input, to evaluate the backdoor attack defense effect.

[0067] The performance of this invention is evaluated below. Specifically, the MNIST, CIFAR-10, GTSRB, and CelebA datasets are used on ResNet, and three representative attack methods, FIBA, WaNet, and BppAttack, are used as benchmarks. The defense process sets the following parameters in the preprocessing stage: in the image segmentation process, the YUV image is divided into 8×8 image blocks; in the denoising process, the denoising coefficient is set to 0.2.

[0068] The retraining phase is configured as follows:

[0069] Table 1 Hyperparameter Settings

[0070]

[0071] Attack Success Rate (ASR) and Benign Accuracy (BA) are used as evaluation metrics, and the calculation formulas are shown in Equations 12 and 13:

[0072]

[0073] in, The number of poisoned samples required for the poisoning model to output a preset incorrect result. The total number of all poisoned samples. To predict the number of benign samples that are correct in the prediction results, This represents the total number of all benign samples.

[0074] The results of the FIBA ​​attack defense experiment are shown in Table 2:

[0075] Table 2. Results of the defense experiment against FIBA ​​attacks.

[0076]

[0077] Regarding the ASR metric, the MNIST dataset had an ASR of 99.41% before defense and dropped to 12.05% after defense; the CIFAR-10 dataset had an ASR of 99.72% before defense and dropped to 10.19% after defense; the GTSRB dataset had an ASR of 99.95% before defense and dropped to 11.07% after defense; and the CelebA dataset had an ASR of 99.93% before defense and dropped to 10.99% after defense. The ASR of all datasets decreased significantly after defense, indicating that the defense measures proposed in this invention have a strong inhibitory effect on attacks.

[0078] Regarding the BA (Balance of Analytical Model) metric, the BA of the MNIST dataset decreased from 97.52% to 96.92% after the defense, a drop of only 0.6 percentage points; the CIFAR-10 dataset decreased from 96.11% to 95.17%, a drop of 0.94 percentage points; the GTSRB dataset decreased from 98.92% to 98.04%, a drop of 0.88 percentage points; and the CelebA dataset decreased from 80.07% to 78.52%, a drop of 1.55 percentage points. The defense had a relatively small impact on the model accuracy, and the clean model had a higher BA value that was close to that after the defense, indicating that the defense proposed in this invention has little impact on the performance of the model itself.

[0079] The results of the defense experiments against WaNet attacks are shown in Table 3:

[0080] Table 3. Experimental results of defense against WaNet attacks

[0081]

[0082] Regarding the ASR metric, the MNIST dataset had an ASR of 99.86% before defense and dropped to 10.63% after defense; the CIFAR-10 dataset had an ASR of 99.55% before defense and dropped to 10.07% after defense; the GTSRB dataset had an ASR of 98.78% before defense and dropped to 10.19% after defense; and the CelebA dataset had an ASR of 99.33% before defense and dropped to 9.88% after defense. The ASR of all datasets decreased significantly after defense, indicating that the defense method proposed in this invention has a very strong inhibitory effect on WaNet attacks.

[0083] Regarding the BA (Balance of Analytical Model) metric, the BA of the MNIST dataset decreased from 99.52% to 98.32% after the defense, a drop of 1.2 percentage points; the CIFAR-10 dataset decreased from 94.15% to 93.01%, a drop of 1.14 percentage points; the GTSRB dataset decreased from 98.97% to 96.05%, a drop of 2.92 percentage points; and the CelebA dataset decreased from 78.99% to 76.93%, a drop of 2.06 percentage points. Although there was a decrease, it was close to the clean model, indicating that the defense method proposed in this invention has little impact on the model accuracy.

[0084] The results of the defense experiments against BppAttack attacks are shown in Table 4:

[0085] Table 4. Experimental results of defense against BppAttack attacks

[0086]

[0087] Regarding the ASR metric, the MNIST dataset had an ASR of 99.79% before defense and dropped to 9.41% after defense; the CIFAR-10 dataset had an ASR as high as 99.91% before defense and dropped to 9.94% after defense; the GTSRB dataset had an ASR close to 100% (99.96%) before defense and dropped to 9.78% after defense; and the CelebA dataset had an ASR of almost 100% (99.99%) before defense and dropped to 10.32% after defense. The ASR of all datasets decreased significantly after defense, indicating that the defense method proposed in this invention has strong resistance to BppAttack attacks.

[0088] Regarding the BA (Balance of Analytical Model) metric, the BA of the MNIST dataset decreased from 99.36% to 99.07% after the defense, a drop of 0.29 percentage points; the CIFAR-10 dataset decreased from 94.54% to 93.79%, a drop of 0.75 percentage points; the GTSRB dataset decreased from 99.25% to 97.06%, a drop of 2.19 percentage points; and the CelebA dataset decreased from 79.06% to 76.97%, a drop of 2.09 percentage points. Although there was a decrease, it was close to the clean model, indicating that the defense method proposed in this invention has little impact on the model accuracy.

[0089] In defense experiments against attacks such as FIBA, WaNet, and BppAttack, the target detection backdoor attack defense method based on discrete cosine transform of this invention demonstrated excellent defense performance, fully showcasing its outstanding defensive capabilities. This invention ensures that the inherent performance of the model itself is not affected, while making backdoor attack methods highly likely to fail.

[0090] like Figure 5 As shown, this embodiment discloses a target detection backdoor attack defense system based on discrete cosine transform, used to execute the above method, which includes the following modules:

[0091] Data preprocessing module: Obtains the training dataset from the public dataset and performs preprocessing operations on all training samples in the training dataset;

[0092] Model training module: Trains the object detection model using a pre-processed training dataset;

[0093] Model retraining module: Using a clean dataset that has not been attacked, the model is retrained to refocus on the key features that were missing before preprocessing and to extract the information contained in the clean dataset.

[0094] Model testing module: The model is tested sequentially using benign samples and toxic samples.

[0095] Other aspects of this embodiment can be found in the above method embodiments.

[0096] The above detailed embodiments have provided a comprehensive description of the present invention, but the present invention is not limited to the described embodiments. For those skilled in the art, various changes, modifications, substitutions, and variations can be made to these embodiments without departing from the principles and spirit of the present invention, and these variations still fall within the protection scope of the present invention.

Claims

1. A target detection backdoor attack defense method based on discrete cosine transform, characterized in that, The specific steps are as follows: Step (1): Obtain the training dataset from the public dataset and perform preprocessing operations on all training samples in the training dataset; Step (2): Train the object detection model using the preprocessed training dataset; Step (3) uses a clean dataset that has not been attacked to conduct a second training on the model, so that the model refocuses on the key features missing in step (1) and mines the information contained in the clean dataset.

2. The target detection backdoor attack defense method based on discrete cosine transform as described in claim 1, characterized in that, Step (1) is as follows: (1-1) Convert the RGB format training samples in the original training dataset to YUV format sample images. (1-2) Divide the YUV format sample images obtained after color space conversion into... A number of image blocks, denoted as Where i and j are the two-dimensional index coordinates of the image patch; (1-3) Divide each block obtained from the block division Perform DCT transformation and normalize the DCT coefficients so that coefficients of different frequencies have the same order of magnitude in the transformation. (1-4) The DCT coefficient sequence obtained for each block is denoised using the β denoising coefficient. Denoising is performed; where u and v are two-dimensional frequency indices of the frequency domain coefficients after DCT transformation; (1-5) Denoise the DCT coefficient sequences of each block Perform an inverse transform to obtain the image of each block; (1-6) The R×R blocks obtained after DCT inverse transformation are merged and stitched together according to their positions before segmentation to form a preprocessed YUV image; (1-7) Convert the YUV image obtained by stitching to a color space to obtain a sample image in RGB format.

3. The target detection backdoor attack defense method based on discrete cosine transform as described in claim 1, characterized in that, Step (2) train the model using the preprocessed training dataset.

4. The target detection backdoor attack defense method based on discrete cosine transform as described in claim 1, characterized in that, Step (3): After the initial training of the model is completed, the model is trained a second time using a clean dataset that has not been poisoned by the attacker.

5. The target detection backdoor attack defense method based on discrete cosine transform as described in claim 1, characterized in that, In step (3), the key features include high-frequency texture and edge detail features, local contrast and small target features, color accuracy loss introduced by color space conversion, contextual features at the boundaries of image blocks, and weak semantic features in low signal-to-noise ratio regions.

6. The target detection backdoor attack defense method based on discrete cosine transform as described in claim 1, characterized in that, In step (3), the information contained in the clean dataset includes the features lost in the preprocessing stage and the complete semantic and geometric information that the target detection task itself requires deep learning.

7. The target detection backdoor attack defense method based on discrete cosine transform as described in any one of claims 1-5, characterized in that, After step (3), proceed to step (4), and test the model using benign samples and toxic samples in sequence.

8. A target detection backdoor attack defense system based on discrete cosine transform, used to perform the method as described in any one of claims 1-7, characterized in that, Includes the following modules: Data preprocessing module: Obtains the training dataset from the public dataset and performs preprocessing operations on all training samples in the training dataset; Model training module: Trains the object detection model using a pre-processed training dataset; Model retraining module: Using a clean dataset that has not been attacked, the model is retrained to refocus on the key features that were missing before preprocessing and to extract the information contained in the clean dataset.