Deep learning model security evaluation system and method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By designing a deep learning model security evaluation system, we have solved the problem of insufficient evaluation of various attack methods in existing platforms. We have achieved unified testing and defense against adversarial samples, backdoors and data poisoning, provided comprehensive security evaluation and enhanced model robustness, and simplified the difficulty for users to understand model performance.

CN118279732BActive Publication Date: 2026-06-23SHANGHAI JIAOTONG UNIV

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI JIAOTONG UNIV
Filing Date: 2024-04-11
Publication Date: 2026-06-23

Application Information

Patent Timeline

11 Apr 2024

Application

23 Jun 2026

Publication

CN118279732B

IPC: G06V20/00; G06V10/776; G06V10/774; G06V10/82; G06N3/084

AI Tagging

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Formation plugging identification method and device based on numerical simulation automatic history matching of oil and gas reservoirs
CN122263191ASimplify workloadGuaranteed no distortionGeometric CAD Design optimisation/simulation Thermodynamics Data set
A rapid equipment calibration fixture
CN224287089USimple structureSimplify workloadElectrical testing Measurement instrument housing Control engineering Work time
MOCK processing methods, devices, electronic equipment and storage media for all scenarios
CN116800600BSimplify workloadreduce complexity Software engineering Microservices
Detachable number tube box
CN224400015UReduce production processImprove work efficiencyConductor/cable markingSigns Electric cables Industrial engineering
A dual-hole shared jet ventilation unit
CN120739570Bless investment Reduce operating costs Jet flow Structural engineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning model security assessment platforms lack comprehensive assessment of data poisoning and adversarial sample attacks, and lack a quantitative evaluation system, making it impossible to effectively assess the robustness and stability of models under various attacks.

Method used

A deep learning model security evaluation system is designed, including a preprocessing module, a robust testing module, a security defense module, and a comprehensive evaluation module. Through adversarial robustness evaluation, backdoor robustness evaluation, and poisoning robustness evaluation, it provides solutions to enhance model robustness, and achieves the scalability and flexibility of the modules through a unified interface design.

Benefits of technology

It enables unified testing and defense against various attack methods on deep learning models, provides a comprehensive evaluation index system, allows users to customize weights to obtain evaluation results that meet their needs, simplifies the understanding of model performance, and enhances the robustness and security of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118279732B_ABST

Patent Text Reader

Abstract

The application provides a deep learning model security evaluation system and method, comprising the following steps: S1, collecting information of a to-be-tested model and performing pretreatment; S2, testing the to-be-tested model to obtain a test result; the test comprises adversarial attack test, data poisoning robustness test and backdoor trigger reverse test; S3, summarizing all test results and performing analysis to output a final evaluation result. The application provides security analysis for a trainer of a deep learning model, realizes integration of various attack algorithms, including adversarial sample attack, backdoor attack and data poisoning attack, etc., analyzes performances before and after model attack and defense, enables a user to clearly understand performance changes of the user's model under various attack types, and enables the user to comprehensively evaluate the user's model on a unified platform and better grasp security of the model.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of deep learning technology, and more specifically, to a deep learning model security assessment system and method. Background Technology

[0002] Deep learning models are a class of machine learning models based on artificial neural networks. They can learn complex functions through multiple layers of nonlinear transformations. They mainly consist of an input layer, hidden layers, and an output layer. Each layer abstractly transforms its input data, enabling the network to learn high-level features from the data. DL models are trained by following the steps of initializing network parameters, forward propagation, calculating the loss function, backpropagation, optimizing the algorithm to update weights, and iterative training. They can be widely used in image recognition and processing, speech and audio processing, natural language processing, and other fields.

[0003] Security is a critical concern for deep learning (DL) models. During the data collection phase, DL models are vulnerable to data poisoning attacks, where attackers manipulate the training data distribution by inserting carefully crafted samples to alter model behavior and reduce performance. During training, they are susceptible to backdoor attacks, where attackers implant backdoors into the DL model by selecting sub-tasks and benign main tasks. During testing, they are vulnerable to adversarial attacks, where attackers add subtle, imperceptible perturbations to clean samples, causing the DL model to output arbitrary, erroneous results with high confidence. To address these three types of attacks, poisoning defense algorithms, backdoor detection and defense algorithms, and adversarial example detection and defense algorithms are required. Evaluating the security of DL models necessitates considering their resistance to various attacks and their stability under anomalous inputs or conditions. Therefore, building a comprehensive platform for testing attacks, assisting in defense, and evaluating model robustness is essential for ensuring the security of deep learning.

[0004] Existing defense and security assessment platforms for deep learning models include the BackdoorBench platform established by the Chinese University of Hong Kong, Shenzhen and the Shenzhen Big Data Research Institute, and the DEEPSEC platform established by Zhejiang University in 2019. The former integrates various methods for backdoor attacks and defenses, and has established platforms in both image processing and natural language processing fields, but it does not cover data poisoning and adversarial example attacks and defenses, nor does it propose a comprehensive evaluation model system; the latter integrates relevant attack and defense algorithms and provides some evaluation indicators, but it lacks a quantitative evaluation system.

[0005] Chinese patent document CN113127857B discloses a method for defending against adversarial attacks using a deep learning model. The method includes: acquiring the original deep learning model to be trained and the original training data; constructing a transformation layer; transforming the original training data and supplementing it with additional data to form training data; training the original deep learning model to obtain a deep learning model; inserting the transformation layer after the input layer of the deep learning model to obtain a highly robust deep learning model; monitoring the input data while the highly robust deep learning model is working, and using the transformation layer in the deep learning model for corresponding defense. While this patent document improves the model's robustness against various adversarial examples and ensures that the model's accuracy is not affected, it still cannot evaluate the model. Summary of the Invention

[0006] In view of the deficiencies in the prior art, the purpose of this invention is to provide a deep learning model security evaluation system and method.

[0007] A deep learning model security assessment system provided by the present invention includes:

[0008] The module includes a preprocessing module, a robust testing module, a security defense module, and a comprehensive evaluation module.

[0009] The preprocessing module converts the model under test into a type that other modules can read;

[0010] The robustness testing module includes an anti-problem robustness assessment module, a backdoor robustness assessment module, and a poisoning robustness assessment module.

[0011] The security defense module provides a solution to enhance the robustness of the model under test;

[0012] The comprehensive evaluation module summarizes the evaluation indicators of all modules and outputs an overall evaluation of the model under test.

[0013] Preferably, the robustness testing module sequentially performs adversarial robustness evaluation, backdoor robustness evaluation, and poisoning robustness evaluation on the model under test to obtain evaluation results; the adversarial robustness evaluation includes examining the model's performance when faced with an attacker who has mastered a white-box model; the backdoor robustness evaluation includes checking whether a backdoor was inserted into the model during training; the poisoning robustness evaluation includes examining the model's performance when faced with a poisoned dataset.

[0014] Preferably, the adversarial robustness assessment module uses the FGSM method to detect whether the model under test can give a correct judgment result when facing malicious samples.

[0015] Preferably, the robustness assessment module identifies the attack input that maximizes the probability of model misjudgment within a preset perturbation range, causing the model to misjudge. Under the same method and the same perturbation level, a small decrease in output accuracy is defined as strong robustness, and vice versa.

[0016] Preferably, the backdoor robustness evaluation module detects whether the model under test has been attacked based on the Neutral Cleanse method. If so, it performs an attack removal operation.

[0017] Preferably, the backdoor robustness evaluation module includes reconstructing the backdoor trigger, filtering backdoor categories based on the style and size of the backdoor trigger, and using the median absolute deviation index to measure outliers to obtain anomaly indicators.

[0018] Preferably, the poisoning robustness assessment module uses the Poison Frogs method to examine the scale to which the normal working ability of the model under test is interfered with when faced with malicious datasets.

[0019] Preferably, the security defense module includes removing backdoors and improving data poisoning robustness; removing backdoors includes randomly selecting a certain proportion of images from the original dataset that have been attacked by backdoors, injecting backdoor triggers, without changing the labels of these data, and then retraining the model after mixing them with other unmodified original data; improving data poisoning robustness includes generating a unique noise for each piece of data in the preset dataset based on the FriendlyNoise method, and using the data with superimposed noise when training the model.

[0020] Preferably, the comprehensive evaluation module summarizes the evaluation results of all modules and gives an overall evaluation of the model through weighted calculation.

[0021] A method for security assessment of deep learning models provided by the present invention includes:

[0022] Step S1: Collect information about the model under test and perform preprocessing;

[0023] Step S2: Test the model to be tested and obtain the test results;

[0024] The tests include adversarial attack testing, data poisoning robustness testing, and backdoor trigger reverse engineering testing.

[0025] Step S3: Summarize and analyze all test results, and output the final evaluation results.

[0026] Compared with the prior art, the present invention has the following beneficial effects:

[0027] 1. All modules of this invention adopt a unified interface design, which greatly simplifies the workload of adding new algorithms and modules in the future. This allows the invention to flexibly adapt to future technological developments and changes in market demand, and also allows users to add their own personalized modules with less difficulty and obtain evaluation results that better meet their needs. It has strong scalability.

[0028] 2. The comprehensive evaluation module provided by this invention not only analyzes and integrates the evaluation results of each independent module, but also provides users with a clear overall score. This intuitive and simplified scoring method allows users to understand the model performance at a glance, greatly reducing the difficulty for users to understand the model performance and enabling users to easily grasp the overall performance of the model. At the same time, users can customize the weights of each module, so that the results can be more inclined to show the aspects that users care about, which is more in line with the user's usage goals and has a high degree of flexibility.

[0029] 3. In addition to attack testing, this invention also provides a security defense module. Based on the detection results of data poisoning and backdoors, it implements a defense algorithm on the model to enhance its robustness against poisoning and backdoor attacks, removes certain backdoors, and intuitively shows the difference in robustness performance before and after the model is defended. This not only helps users strengthen the model, but also helps users better understand the model's security performance.

[0030] 4. The purpose of this invention is to provide security performance analysis for trainers of deep learning models. Currently, most integrated platforms for deep learning attack / defense methods focus on the effectiveness of attack / defense algorithms, but lack analysis of the models used; furthermore, other platforms focus more on the implementation under a single attack method, lacking comprehensiveness. This invention aims to provide trainers of deep learning models with security analysis of the trained models, achieving the integration of multiple attack algorithms, including adversarial example attacks, backdoor attacks, and data poisoning attacks. Analyzing the model's performance before and after attacks and defenses allows users to clearly understand the changes in their model's performance under various attack types. By using this invention, users can comprehensively evaluate their models on a unified platform, better grasping the model's security.

[0031] Other beneficial effects of the present invention will be explained in detail through the introduction of specific technical features and technical solutions in specific embodiments. Those skilled in the art should be able to understand the beneficial technical effects brought about by these technical features and technical solutions through the introduction of these technical features and technical solutions. Attached Figure Description

[0032] Other features, objects, and advantages of the present invention will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0033] Figure 1 This is a system block diagram of the present invention.

[0034] Figure 2 This is a flowchart of the method of the present invention. Detailed Implementation

[0035] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all fall within the protection scope of the present invention.

[0036] Currently, there is still no standard and effective testing and evaluation method for the security of deep learning models. This invention proposes a deep learning model security evaluation system, which consists of six modules: a preprocessing module, an adversarial robustness evaluation module, a backdoor robustness evaluation module, a poisoning robustness evaluation module, a security defense module, and a comprehensive evaluation module. To evaluate the model under test, it needs to be tested. First, adversarial attack tests are performed to obtain evaluation results. However, the testing is not yet complete, so backdoor attack and poisoning attack tests are conducted. All test results are summarized, and finally, an overall evaluation of the model under test is derived, allowing testers to have a comprehensive understanding of the model's security status.

[0037] Reference Figure 1 As shown, this security assessment system consists of six modules: preprocessing module, adversarial robustness assessment module, backdoor robustness assessment module, poisoning robustness assessment module, security defense module, and comprehensive assessment module. The system's innovation lies in its unified testing of the three mainstream attack methods used in deep learning model attacks and the provision of model defense solutions, while also constructing a comprehensive evaluation index system to measure model security.

[0038] The preprocessing module addresses the issue of low support for multiple scenarios and models in previous evaluation tools. Based on the preprocessing of the model and the specified dataset, the model to be tested is read in a manner supported by the subsequent evaluation module.

[0039] The three main attack methods that deep learning models face are adversarial attacks, backdoor attacks, and poisoning attacks. Attacked models exhibit characteristics such as difficulty in detecting attacks and vulnerability. The adversarial robustness evaluation module, backdoor robustness evaluation module, and poisoning robustness evaluation module utilize existing mature algorithms to test the model's resistance to these attack methods and verify whether the model has already been attacked.

[0040] The security defense module provides solutions to enhance model robustness. For models vulnerable to backdoor attacks, the backdoor's impact can be eliminated without significantly reducing normal operating capabilities by detecting backdoor triggers and strengthening training. For models susceptible to data poisoning, data poisoning enhancement algorithms are deployed to improve model poisoning robustness.

[0041] The comprehensive evaluation module integrates the test results of the first four modules. The first four modules proposed evaluation indicators corresponding to the test methods. The results are weighted in this module to obtain an overall evaluation of the model, which is more intuitive and referential, and makes it easier for testers to have a comprehensive understanding of the model's safety status.

[0042] All modules of this invention employ a unified interface design, which greatly simplifies the workload of adding new algorithms and modules in the future. This allows the invention to flexibly adapt to future technological developments and changes in market demands, and also enables users to easily add their own personalized modules to obtain evaluation results that better suit their needs. The comprehensive evaluation module not only analyzes and integrates the evaluation results of each independent module, but also provides users with a clear overall score. This intuitive and simplified scoring method allows users to understand the model's performance at a glance, greatly reducing the difficulty of understanding model performance and allowing users to easily grasp the overall performance of the model. At the same time, users can customize the weights of each module, making the results more focused on aspects that users care about, and better aligning with their usage goals.

[0043] In addition to attack testing, this invention also implements a security defense module. Based on the detection results of data poisoning and backdoors, it implements a defense algorithm on the model to enhance its robustness against poisoning and backdoor attacks, removes certain backdoors, and intuitively shows the difference in robustness performance before and after the model is defended. This not only helps users strengthen the model, but also helps users better understand the model's security performance.

[0044] The above are basic embodiments of the present invention. The technical solution of the present invention will be further described below through a preferred embodiment.

[0045] Example 1

[0046] Reference Figure 2As shown, this embodiment uses an image classification model as an example to introduce the workflow and effects of this system. The robustness testing module is a combination of the adversarial robustness evaluation module, the backdoor robustness evaluation module, and the poisoning robustness evaluation module. For the model to be evaluated, the tests of these three sub-modules need to be executed sequentially to obtain the evaluation results of the original model. Through adversarial attack testing, the performance of the model when facing an attacker who has mastered the white-box model can be verified; through backdoor trigger detection, it can be checked whether a backdoor has been inserted into the model during training; through data poisoning testing, it can be verified that the model performs when facing a poisoned dataset.

[0047] Specifically,

[0048] 1. Counterattack Test

[0049] This section implements the adversarial attack method as a separate module, supporting various adversarial attack testing methods. Adversarial attacks test whether a model can make correct judgments when faced with malicious samples. Malicious samples refer to model inputs designed using specific adversarial attack methods; they are almost indistinguishable from normal samples to human observers, but can deceive the model into making incorrect judgments. This invention primarily uses the classic and effective adversarial attack testing method FGSM (fast gradient sign method) for untargeted attacks. By executing this method, the attack input with the highest probability of misjudgment within a specified perturbation range can be found, thus forcing the model to misjudge. For a highly secure model, under the same method and perturbation level, the output accuracy should decrease only slightly, indicating strong adversarial robustness. Conversely, if the model experiences a significant performance degradation after being attacked, its adversarial robustness is weak.

[0050] Beyond proactive, untargeted adversarial attacks, this embodiment also evaluates the robustness of the model itself to input in this module. Taking an image classification model as an example, in practical use, the input images may not be standard. The model needs to handle situations such as imperfect image clarity (noise superimposed on the original image) and processed images (compressed or enlarged images), and still maintain good performance. This is a crucial aspect of evaluating model security. Specifically, this invention evaluates whether the model can still function normally after Gaussian noise is superimposed on the image, Gaussian blur is applied to the image, and the image is compressed.

[0051] 2. Robustness test of data poisoning

[0052] This section implements a method for testing the robustness of data poisoning, similar to the adversarial attack testing module. This module also supports replacing the evaluation algorithm. Data poisoning attacks detect the scale to which a model's normal working ability is interfered with when facing malicious datasets. In this invention, the Poison Frogs method is mainly used for detection. The basic principle of this algorithm is to create poisoned data based on feature collisions. During forward propagation, the algorithm performs gradient descent to minimize the L2 distance between the sample and the target sample in the feature space. During backward propagation, the algorithm minimizes the Frobenius distance between the value calculated during forward propagation in the input space and the base sample, thus making the calculated poisoned sample appear consistent with the base sample, but similar to the target sample in the feature space.

[0053] 3. Backdoor detection

[0054] This section implements the testing method for backdoor trigger inversion. Similar to the aforementioned modules, this module also supports multiple evaluation methods. Backdoor detection tests whether the model has been attacked; if so, it attempts to remove it. In this invention, the Neutral Cleanse method is mainly used for detection. The principle of this algorithm is that flipping the backdoor category requires less perturbation than the normal category. Therefore, a minimum pixel-scale label perturbation is applied to all output categories, combined with outlier detection methods to identify potential backdoor categories, and then backdoor triggers are reconstructed using reverse engineering techniques. During forward propagation, the algorithm performs gradient descent to optimize and minimize the classification loss between samples containing backdoor triggers and the target category, while limiting the size of the backdoor triggers, thereby reconstructing backdoor triggers that satisfy misclassification and have a small size. The Adam optimizer is used to reconstruct the backdoor triggers for each category in the samples.

[0055] Backdoor categories are filtered based on the reconstructed backdoor trigger pattern and size. Outliers are measured using the median absolute deviation index, resulting in an anomaly index. If the anomaly index is greater than 1.96 when scaled to the same size as normal samples, there is a greater than 95% probability that this category is an outlier.

[0056] The security defense module is a defense against data poisoning and backdoor detection results in the robustness testing module. It aims to improve the robustness of the model or repair the model that has been attacked, thereby achieving integrated evaluation and defense. It intuitively shows the difference in robustness performance before and after the model is defended, and better helps users understand the security performance of the model.

[0057] 1. Backdoor removal

[0058] This section aims to retrain models with backdoors to fix them. In this invention, the NeutralCleanse method is used to defend against backdoors by reconstructing triggers based on detection results, while maintaining the model's prediction accuracy on clean datasets.

[0059] For the labels in the original dataset that have been attacked by backdoors, a certain proportion of images are randomly selected and injected with backdoor triggers without changing the labels of these data. These are then mixed with other unmodified original data and the model is retrained. After training, the model can remove the backdoors, and the success rate of backdoor attacks on the model by these triggers is almost zero. Meanwhile, due to the presence of a large amount of normal data, the model's prediction accuracy for normal data remains high.

[0060] It is worth noting that the effectiveness of removing backdoors here depends on the similarity between the reconstructed trigger and the real trigger. After finding the backdoor target category, the number of optimization iterations can be increased to obtain a more accurate trigger, thereby improving the defense effect.

[0061] 2. Improved robustness of data poisoning

[0062] This section aims to improve the model's resistance to data poisoning. Augmented training can reduce the impact of image perturbations on the model's normal operation. This invention primarily uses the FriendlyNoise method, which generates a unique noise for each piece of data in a specified dataset. The model is trained using this noise-laden data, thereby improving the model's resistance to data poisoning without significantly reducing its performance. The specific principle is as follows:

[0063] For the model to be defended, it is first trained several times on the dataset to give it some working capability. Then, noise is generated for each data point in the dataset, calculated using the following formula:

[0064]

[0065] Where ∈ represents noise, t is the iteration number, and η opt For learning rate, To find the partial derivative with respect to ∈, where λ is the ratio of the balancing defense effect to the noise level, D KL To calculate the KL divergence, f θ For the model to be defended, x i Let be the i-th data point in the dataset. After a certain number of iterations, the generated noisy data becomes the defense result. Superimposing this noisy data onto the original data creates a new dataset that can be used to train a robust poisoning model. Specifically, to ensure the generalizability of the defense effect, a small amount of random noise can be added during each training round.

[0066] This improvement in data poisoning robustness enhances the model's ability to resist data poisoning of the specified dataset without a significant decline in performance. In this embodiment, the classification accuracy of the defended model is used as the evaluation criterion.

[0067] The comprehensive evaluation module summarizes the evaluation results of the aforementioned modules and provides an overall evaluation of the model through weighted calculation, allowing users to have the clearest and most intuitive understanding of the model's security status.

[0068] Specifically, this embodiment actually deploys 11 types of metrics, listed below:

[0069]

[0070]

[0071] Based on the evaluation values of each indicator obtained from the aforementioned modules, this comprehensive evaluation module performs normalization and weighting. The weights can be set by the user according to actual needs. This invention also provides preset weights, determined using the entropy weighting method based on a large number of test results. Finally, the overall score of the model is given through weighting, calculated using the following formula: Among them, W i Let the weight of the i-th indicator satisfy the following condition: Score i The normalized evaluation value of the i-th indicator is given, where n is the total number of indicators. The overall score and the results of each test are displayed together. This invention also provides a method for evaluating the security of deep learning models, comprising:

[0072] Step S1: Collect information about the model under test and perform preprocessing;

[0073] Step S2: Test the model to be tested and obtain the test results;

[0074] The tests include adversarial attack testing, data poisoning robustness testing, and backdoor trigger reverse engineering testing.

[0075] Step S3: Summarize and analyze all test results, and output the final evaluation results.

[0076] The purpose of this invention is to provide security performance analysis for trainers of deep learning models. Currently, most integrated platforms for deep learning attack / defense methods focus on the effectiveness of the attack / defense algorithms, but lack analysis of the models used; furthermore, other platforms focus more on the implementation under a single attack method, lacking comprehensiveness. This invention aims to provide trainers of deep learning models with security analysis of their trained models, achieving the integration of multiple attack algorithms, including adversarial example attacks, backdoor attacks, and data poisoning attacks. Analyzing the model's performance before and after attacks and defenses allows users to clearly understand the changes in their model's performance under various attack types. By using this invention, users can comprehensively evaluate their models on a unified platform, better understanding the model's security.

[0077] Those skilled in the art will understand that, besides implementing the system and its various devices, modules, and units provided by this invention in the form of purely computer-readable program code, the same functions can be achieved entirely through logical programming of the method steps, making the system and its various devices, modules, and units of this invention function in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, the system and its various devices, modules, and units provided by this invention can be considered as a hardware component, and the devices, modules, and units included therein for implementing various functions can also be considered as structures within the hardware component; alternatively, the devices, modules, and units for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0078] In the description of this application, it should be understood that the terms "upper", "lower", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this application.

[0079] Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention. Unless otherwise specified, the embodiments and features described in this application can be arbitrarily combined with each other.

Claims

1. A deep learning model security assessment system, comprising a preprocessing module, a robust testing module, a security defense module, and a comprehensive assessment module, characterized in that: The preprocessing module converts the model under test into a type that other modules can read; The robustness testing module includes an anti-problem robustness assessment module, a backdoor robustness assessment module, and a poisoning robustness assessment module. The security defense module provides a solution to enhance the robustness of the model under test; The comprehensive evaluation module summarizes the evaluation indicators of all modules and outputs an overall evaluation of the model under test.

2. The deep learning model security assessment system according to claim 1, characterized in that, The robustness testing module sequentially performs adversarial robustness evaluation, backdoor robustness evaluation, and poisoning robustness evaluation on the model under test, and obtains the evaluation results. The adversarial robustness evaluation includes examining the model's performance when faced with an attacker who has mastered a white-box model. The backdoor robustness evaluation includes checking whether a backdoor was inserted into the model during training. The poisoning robustness evaluation includes examining the model's performance when faced with a poisoned dataset.

3. The deep learning model security assessment system according to claim 1, characterized in that, The adversarial robustness assessment module uses the FGSM method to detect whether the model under test can give a correct judgment result when facing malicious samples.

4. The deep learning model security assessment system according to claim 3, characterized in that, The adversarial robustness evaluation module identifies the attack input that maximizes the probability of model misjudgment within a preset perturbation range, causing the model to misjudge. Under the same method and the same perturbation level, a small decrease in output accuracy is called strong adversarial robustness, and vice versa.

5. The deep learning model security assessment system according to claim 1, characterized in that, The backdoor robustness evaluation module uses the Neutral Cleanse method to detect whether the model under test has been attacked. If so, it performs an attack removal operation.

6. The deep learning model security assessment system according to claim 5, characterized in that, The backdoor robustness evaluation module includes reconstructing the backdoor trigger, filtering backdoor categories based on the style and size of the backdoor trigger, and using the median absolute deviation index to measure outliers to obtain anomaly indicators.

7. The deep learning model security assessment system according to claim 1, characterized in that, The poisoning robustness assessment module uses the Poison Frogs method to examine the extent to which the normal working ability of the model under test is interfered with when faced with malicious datasets.

8. The deep learning model security assessment system according to claim 1, characterized in that, The security defense module includes removing backdoors and improving data poisoning robustness. Removing backdoors involves randomly selecting a certain proportion of images from the original dataset that have been attacked by backdoors, injecting backdoor triggers without changing the labels of these data, and then retraining the model after mixing them with other unmodified original data. Improving data poisoning robustness involves generating a unique noise for each piece of data in the preset dataset based on the FriendlyNoise method, and using the data with superimposed noise when training the model.

9. A deep learning model security evaluation system according to claim 1, characterized in that, The comprehensive evaluation module summarizes the evaluation results of all modules and provides an overall evaluation of the model through weighted calculation.

10. A method for security assessment of deep learning models, based on the deep learning model security assessment system according to any one of claims 1-9, characterized in that, Includes the following steps: Step S1: Collect information about the model under test and perform preprocessing; Step S2: Test the model to be tested and obtain the test results; The tests include adversarial attack testing, data poisoning robustness testing, and backdoor trigger reverse engineering testing. Step S3: Summarize and analyze all test results, and output the final evaluation results.

Citation Information

Patent Citations

Deep learning model defense methods and deep learning models against adversarial attacks
CN113127857B
Robustness evaluation and enhancement system of artificial intelligence image classification model
CN111950628A
Generalization safety evaluation method for deep learning image classification model
CN112464245A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Deep learning model defense methods and deep learning models against adversarial attacks

Robustness evaluation and enhancement system of artificial intelligence image classification model

Generalization safety evaluation method for deep learning image classification model