Model detection processing method and apparatus

By generating adversarial examples in the intelligent model and calculating detection metrics, the security and accuracy issues caused by data attacks are resolved, and the robustness and interpretability of the model are improved.

CN117252270BActive Publication Date: 2026-06-16ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2023-07-07
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In the application of intelligent models, data attacks can lead to security and accuracy issues, limiting their further application in related fields.

Method used

By extracting target data from a corpus, generating adversarial samples using an adversarial model, and inputting these samples into the model to be detected for processing, the sample processing results are obtained. Detection indicators are then calculated based on the model type, including robustness, interpretability, and data security detection.

🎯Benefits of technology

It improves the effectiveness and comprehensiveness of model detection, helps adjust the model to meet expected output and performance, and enhances the model's security and interpretability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117252270B_ABST
    Figure CN117252270B_ABST
Patent Text Reader

Abstract

The embodiment of the specification provides a model detection processing method and device, wherein the model detection processing method comprises the following steps: in a detection process of a to-be-detected model, target corpus is extracted from a corpus according to a model type of the to-be-detected model; and the extracted target corpus is input into an adaptive adversarial model to generate an adversarial sample, a detection sample is determined based on the generated adversarial sample, the detection sample is further input into the to-be-detected model for processing, a sample processing result is obtained, and then detection index calculation is performed in a detection dimension corresponding to the model type according to the sample processing result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This document relates to the field of data processing technology, and in particular to a model detection processing method and apparatus. Background Technology

[0002] With the rapid development of artificial intelligence-related technologies, intelligent models built based on machine learning have been widely used in many fields. However, the application of intelligent models may face various data attacks, which has raised questions about the security and accuracy of intelligent models and limited their further application in related fields. Summary of the Invention

[0003] This specification provides one or more embodiments of a model detection processing method, comprising: extracting target corpus from a corpus according to the model type of the model to be detected; inputting the target corpus into an adapted adversarial model to generate adversarial samples; inputting detection samples determined based on the adversarial samples into the model to be detected for processing to obtain sample processing results; and calculating detection metrics under the detection dimension corresponding to the model type based on the sample processing results.

[0004] This specification provides one or more embodiments of a model detection processing apparatus, comprising: a corpus extraction module configured to extract target corpus from a corpus based on the model type of the model to be detected; a sample generation module configured to input the target corpus into an adapted adversarial model to generate adversarial samples; a sample processing module configured to input detection samples determined based on the adversarial samples into the model to be detected for processing to obtain sample processing results; and a detection index calculation module configured to calculate detection indexes based on the sample processing results in a detection dimension corresponding to the model type.

[0005] This specification provides one or more embodiments of a model detection processing device, including: a processor; and a memory configured to store computer-executable instructions, which, when executed, cause the processor to: extract target corpus from a corpus according to the model type of the model to be detected; input the target corpus into an adapted adversarial model to generate adversarial samples, thereby obtaining adversarial samples; input detection samples determined based on the adversarial samples into the model to be detected for processing, thereby obtaining sample processing results; and calculate detection metrics in the detection dimension corresponding to the model type based on the sample processing results.

[0006] This specification provides one or more embodiments of a storage medium for storing computer-executable instructions, which, when executed by a processor, implement the following process: extracting target corpus from a corpus according to the model type of the model to be detected; inputting the target corpus into an adapted adversarial model to generate adversarial samples; inputting detection samples determined based on the adversarial samples into the model to be detected for processing to obtain sample processing results; and calculating detection metrics under the detection dimension corresponding to the model type based on the sample processing results. Attached Figure Description

[0007] To more clearly illustrate the technical solutions in one or more embodiments of this specification or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0008] Figure 1 A schematic diagram illustrating the implementation environment of a model detection processing method provided in one or more embodiments of this specification;

[0009] Figure 2 A flowchart illustrating a model detection processing method provided in one or more embodiments of this specification;

[0010] Figure 3 A flowchart illustrating a model detection processing method for image classification model scenarios, provided in one or more embodiments of this specification.

[0011] Figure 4 A flowchart illustrating a model detection processing method applied to a dialogue generation model scenario, provided in one or more embodiments of this specification.

[0012] Figure 5 A schematic diagram of a model detection processing device provided in one or more embodiments of this specification;

[0013] Figure 6 This is a schematic diagram of the structure of a model detection and processing device provided in one or more embodiments of this specification. Detailed Implementation

[0014] To enable those skilled in the art to better understand the technical solutions in one or more embodiments of this specification, the technical solutions in one or more embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this specification, and not all of the embodiments. Based on one or more embodiments of this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of this document.

[0015] The model detection processing method provided in one or more embodiments of this specification is applicable to the implementation environment of a model detection platform. (Refer to...) Figure 1 The implementation environment includes at least a server 101 for model detection processing. Additionally, it may include a corpus 102 for storing corpus data, an adversarial model library 103 for storing adversarial models, and a detection model library 104 for storing interpretation models and data detection models. The server 101 can be one or more servers, a server cluster consisting of several servers, or a cloud server from a cloud computing platform, used for processing the models to be detected.

[0016] In this implementation environment, during the detection process of the model to be detected, server 101 reads the target corpus from corpus 102. After extracting the target corpus, adversarial samples are generated by inputting the target corpus into the adversarial model based on the adversarial model library 103 that is adapted to the target corpus. The detection samples determined based on the adversarial samples are then input into the model to be detected for processing to obtain the sample processing results. Furthermore, when performing interpretable detection or data security detection on the model to be detected in the interpretable dimension or data security dimension, the corresponding interpretable model or data detection model can be read from the detection model library 104, thereby realizing interpretable detection or data security detection of the model to be detected in the interpretable dimension or data security dimension.

[0017] One or more embodiments of a model detection processing method provided in this specification are as follows:

[0018] Reference Figure 2 The model detection processing method provided in this embodiment specifically includes steps S202 to S208.

[0019] Step S202: Extract target corpus from the corpus according to the model type of the model to be detected.

[0020] The detection models described in this embodiment include discriminative models and generative models. Discriminative models are models built by learning the mapping relationship between inputs and outputs, and are used to predict new outputs. Discriminative models can be support vector machine models, neural network models, or perceptron models, etc. In specific application areas, discriminative models can be classification models for classifying text, speech, images, or tables. This embodiment uses an image classification model as an example. Generative models are models built by learning the distribution of data, and are used to generate new data. Generative models can be Naive Bayes models, decision tree models, etc. This embodiment uses a dialogue generation model (such as ChatGPT (Chat Generative Pre-training Transformer) model) as an example.

[0021] In practice, during the detection of the model to be tested, the model may be a discriminative model or a generative model. Therefore, in the process of extracting corpus data for model detection, corpus extraction is performed based on the model type of the model to be tested, specifically, corpus extraction can be performed based on the model type of discriminative or generative model. Alternatively, the input data type of the model to be tested can be used as the model type, and corpus extraction can be performed based on the data type to obtain the target corpus for the detection of the model. For example, for an image classification model, images are extracted from the corpus as the target corpus; for ChatGPT, text, speech, or images are extracted from the corpus as the target corpus.

[0022] Step S204: Input the target corpus into the adapted adversarial model to generate adversarial samples and obtain adversarial samples.

[0023] In the specific detection process, after extracting the target corpus for the model to be detected, adversarial samples of the model to be detected are generated based on the target corpus. Specifically, adversarial samples are generated by inputting the target corpus into an adversarial model that is compatible with the target corpus. Alternatively, adversarial samples can be generated by inputting the target corpus into an adversarial model that is compatible with the model to be detected or the model type of the model to be detected.

[0024] In practical applications, in order to improve the success rate of adversarial sample generation and to enhance the comprehensiveness of adversarial samples, this embodiment provides an adversarial model library consisting of multiple adversarial models. Based on this, during the adversarial sample generation process, an appropriate adversarial model can be selected from the adversarial model library to generate adversarial samples.

[0025] Specifically, in one optional implementation of this embodiment, the target corpus is input into an adapted adversarial model to generate adversarial samples, thereby obtaining adversarial samples, including:

[0026] Based on the corpus information of the target corpus, an adversarial model that matches the corpus information is selected from the adversarial model library;

[0027] The target corpus is input into the screened adversarial model to generate adversarial samples, thus obtaining the adversarial samples.

[0028] Taking image classification models as an example, during the detection and processing of image classification models, the images to be classified extracted from the corpus can be input into adversarial models to generate adversarial samples. Here, the adversarial model can be a random processing model that performs random cropping or random rotation of the image, a noise model that performs distortion or noise transformation of the image, a transformation model that performs blur transformation or digital domain transformation of the image, or a transformation model that performs facial attribute transformation or image style transformation of the image. These adversarial models are all stored in the adversarial model library, and the corresponding adversarial model can be selected from the adversarial model library to generate adversarial samples according to the actual processing needs.

[0029] If the random processing model is selected as the adversarial model in the adversarial model library, the images to be classified extracted from the corpus will be input into the random processing model for random cropping or random rotation, and the processed images will be used as adversarial samples.

[0030] Step S206: Input the detection sample determined based on the adversarial sample into the model to be detected for processing to obtain the sample processing result.

[0031] The above process involves generating adversarial samples by inputting the target corpus into the adversarial model, and then determining the detection samples to be input into the model to be detected based on the adversarial samples. Specifically, the method of determining the detection samples based on the adversarial samples varies depending on the model type.

[0032] Specifically, if the model type is a discriminative model, then the adversarial examples can be used as the detection samples. If the model type is a generative model, the following optional implementation method is used to determine the detection samples: the adversarial examples are written into the question adversarial template of the model to be detected to generate questions, and the generated question data is used as the detection samples.

[0033] Among them, the adversarial question template refers to the question template used to generate the input requirements of the model to be tested. The question template is an adversarial question guidance template designed to guide intent. For example, when performing data security dimension detection, the adversarial question template is used to try to guide the model to generate results that do not comply with data security rules, thereby reducing the possibility of the model generating results that do not comply with data security rules from an adversarial perspective.

[0034] For example, when performing data security dimension detection on the ChatGPT model, after inputting the text corpus extracted from the corpus into the adversarial model to generate adversarial samples, the adversarial samples are written into the adversarial template to obtain adversarial question text for detecting the ChatGPT model.

[0035] In practice, after determining the detection samples based on adversarial examples, the detection samples are input into the model to be detected for processing. Specifically, the processing performed on the model to be detected after the detection samples are input varies depending on the model type. The following describes in detail the processing process performed when the detection samples are input into the generative model and the processing process performed when the detection samples are input into the discriminative model.

[0036] In one optional implementation of this embodiment, if the model type is a discriminative model, the detection samples determined based on the adversarial examples are input into the model to be detected for processing to obtain the sample processing results, including:

[0037] The adversarial sample is identified as the detection sample, and the detection sample is input into the detection model for classification processing to obtain the sample classification result.

[0038] As described above, when the model type of the model to be detected is a generative model, in addition to generating detection samples by writing adversarial examples into the problem adversarial template of the model to be detected, another optional implementation provided in this embodiment, if the model type is a generative model, inputs the detection samples determined based on the adversarial examples into the model to be detected for processing to obtain sample processing results, including:

[0039] The problem data is input into the model to be detected to generate the adversarial template corresponding to the problem, and the generated data is used as the sample processing result.

[0040] Following the previous example, after writing the adversarial sample into the adversarial template to obtain the adversarial question text for detection by the ChatGPT model, the obtained adversarial question text is input into the ChatGPT model for generation processing, and the answer output by the ChatGPT model is the sample processing result.

[0041] Step S208: Based on the sample processing results, calculate the detection index under the detection dimension corresponding to the model type.

[0042] The detection dimension described in this embodiment refers to the dimension used to detect the performance or metrics of the model under test. This detection dimension can be a robustness dimension for detecting the robustness of the model, an interpretability dimension for detecting the interpretability of the model, or a data security dimension for detecting the data security of the model. It should be noted that the detection dimension used for detection varies depending on the type of model under test. For example, robustness and / or interpretability dimensions can be detected for discriminative models, while data security dimensions can be detected for generative models.

[0043] In the robust dimension, the detection processing performed on the model to be detected is to detect whether the model to be detected can maintain the expected performance and / or the expected output when the input data is disturbed or changed. In this embodiment, the target corpus, which is used as input data, is subjected to adversarial processing to disturb or change the target corpus, so as to detect whether the model to be detected can maintain the expected performance and / or the expected output after the disturbance or change processing.

[0044] In the interpretability dimension, the detection processing performed on the detection model refers to the explanation or description of the prediction results or decision-making process of the detection model, making the working principle and judgment basis of the model clearer and more transparent.

[0045] In terms of data security, the detection processing performed on the model to be detected refers to the detection of content security, data security, and other aspects of the output data of the model. For example, the ChatGPT model may be detected for "unfriendly behavior", "minor protection", "public safety", "negative content", "personal privacy", "institutional privacy", "bias and discrimination", "dangerous behavior", "mental health", or "false information".

[0046] In practice, when the model type of the model to be tested is a discriminative model, the detection index is calculated according to the detection dimension corresponding to the model type of the sample processing results. Specifically, the detection index can be calculated under the robust dimension and the interpretable dimension. In this way, the expected performance or expected output of the model to be tested under the robust dimension and the interpretable dimension can be determined by the calculation of the detection index. The calculation process of the detection index under these two detection dimensions is explained in detail below.

[0047] In one optional implementation of this embodiment, based on the sample processing results, detection metrics are calculated under the detection dimension corresponding to the model type, including:

[0048] Based on the sample classification results and the classification labels of the target corpus, determine the classification accuracy of the model to be detected for the detected samples;

[0049] Based on the classification accuracy, a first adjustment strategy for the model to be detected under the robust dimension is generated.

[0050] The first adjustment strategy includes an adjustment strategy for adjusting the parameters of the model to be detected; for example, if the classification accuracy of the image classification model for adversarial examples is lower than the expected classification accuracy of the image classification model, then an adjustment strategy for adjusting the parameters of the image classification model is generated so that after adjusting the parameters of the image classification model according to the adjustment strategy, the classification accuracy of the image classification model for adversarial examples can approach the expected classification accuracy.

[0051] In addition to the robust dimension detection processing provided above, interpretable dimension detection processing can also be performed on the model to be detected. Specifically, in an optional implementation method provided in this embodiment, interpretable dimension detection processing is performed in the following way:

[0052] Construct a detection set under an interpretable dimension based on the model to be detected, the interpretable model, and the detection samples;

[0053] The detection set is input into an interpretable algorithm for at least one detection subclass under the interpretable dimension for interpretation evaluation, and the interpretability score of the model to be detected in the at least one detection subclass is obtained.

[0054] Furthermore, based on obtaining the interpretability score of at least one detection subclass of the model to be detected in the interpretability dimension, an adjustment strategy for adjusting the model to be detected can be determined from the interpretability score. The optional implementation method for generating the adjustment strategy is as follows: Based on the interpretability score of the at least one detection subclass, a second adjustment strategy for the model to be detected in the interpretability dimension is generated.

[0055] Optionally, the detection subclass includes at least one of the following: correctness subclass, completeness subclass, difference subclass, and simplicity subclass.

[0056] For example, in the process of calculating the detection index of the interpretable dimension of an image classification model, an interpretable model that interprets the current image classification model is selected from the detection model library. A detection tuple is constructed based on the image classification model, the interpretable model, and the detection samples. The constructed detection tuple is then input into the interpretable algorithms of the correctness subclass, completeness subclass, difference subclass, and simplicity subclass, respectively. The interpretable scores of the image classification model in the correctness subclass, completeness subclass, difference subclass, and simplicity subclass are calculated respectively, and the interpretable scores of the image classification model in the four subclasses of correctness subclass, completeness subclass, difference subclass, and simplicity subclass are obtained.

[0057] Furthermore, based on the interpretability scores of these four subcategories, an adjustment strategy for adjusting the image classification model is determined. Finally, the image classification model is adjusted according to the determined adjustment strategy so that the adjusted image classification model can obtain higher or expected interpretability scores in the four subcategories of correctness, completeness, dissimilarity, and simplicity.

[0058] Furthermore, in specific implementation, when the model type of the model to be tested is a generative model, during the process of calculating the detection index under the detection dimension corresponding to the model type based on the sample processing results, the detection index can be calculated under the data security dimension. In this way, the performance of the output data of the model to be tested under the data security dimension can be determined through the calculation of the detection index.

[0059] Specifically, based on the above-mentioned process of inputting problem data into the model to be detected and generating data corresponding to the problem adversarial template, this embodiment provides an optional implementation method in which, according to the sample processing results, detection indicators are calculated under the detection dimension corresponding to the model type, including:

[0060] The generated data is input into at least one data detection model in the data security dimension of the detection model library for data detection.

[0061] Based on the data detection results output by the data detection model, calculate the data security index of the model to be detected in at least one detection category.

[0062] Furthermore, based on obtaining at least one data security indicator for the detection model under the data security dimension, an adjustment strategy for adjusting the detection model can be determined from the data security indicator. The optional implementation method for generating the adjustment strategy is as follows: summarize the data security indicators for at least one detection category under the data security dimension to obtain a summary result; generate a third adjustment strategy for the detection model based on the summary result.

[0063] For example, during the calculation of data security dimension detection indicators for the ChatGPT model, a data detection model is selected from the detection model library to perform data security detection on the output data of the current ChatGPT model in at least one detection category. These detection categories include: "unfriendly behavior," "minor protection," "public safety," "negative content," "personal privacy," "institutional privacy," "bias and discrimination," "dangerous behavior," "mental health," and / or "false information." The detection model library stores data detection models for each detection category.

[0064] After selecting one or more data detection models for each detection category from the detection model library, the output data of the ChatGPT model is input into the selected data detection models for data detection. Based on the data detection results, the risk rate or sensitivity rate of the ChatGPT model in each detection category is calculated. If the risk rate or sensitivity rate of a certain detection category is relatively high, an adjustment strategy is generated for that detection category to adjust the ChatGPT model so that the adjusted ChatGPT model meets the expected risk rate or sensitivity rate in each detection category.

[0065] In summary, the one or more model detection processing methods provided in this embodiment, during the detection processing of the model to be detected, start from the model type of the model to be detected, extract the corresponding target corpus from the corpus, generate corresponding adversarial samples by inputting the target corpus into the adapted adversarial model, determine the detection samples to be input into the model to be detected based on the obtained adversarial samples, input the determined detection samples into the model to be detected for processing to obtain the sample processing results, and calculate the detection index under the detection dimension corresponding to the model type based on the sample processing results. This improves the model detection effect and makes the detection processing of the model to be detected more comprehensive, thereby helping to make the output or performance of the model obtained after detection processing more in line with expectations.

[0066] The following example uses a model detection processing method provided in this embodiment to illustrate its application in an image classification model scenario. Figure 3 The model detection processing method provided in this embodiment will be further explained below. See [link to documentation]. Figure 3 The model detection processing method applied to image classification model scenarios includes the following steps.

[0067] Step S302: Extract image samples for the image classification model from the corpus.

[0068] Step S304: Input the image sample into the adapted adversarial model to generate an adversarial image and obtain the adversarial image.

[0069] Step S306: The adversarial image is used as the input image classification model for image classification processing to obtain the image classification result.

[0070] Step S308: Calculate the classification accuracy of the image classification model based on the image classification results and the classification labels of the image samples.

[0071] Step S310: Generate an adjustment strategy for the image classification model under the robust dimension based on the classification accuracy.

[0072] Step S312: Construct a detection set in an interpretable dimension based on the image classification model, the interpretation model, and the detected images.

[0073] Step S314: Input the detection set into the interpretable algorithm of the detection subclass under the interpretable dimension for interpretation evaluation, and obtain the interpretability score of the image classification model in the detection subclass.

[0074] Step S316: Based on the interpretability score of the detected subclass, generate an adjustment strategy for the image classification model in the interpretability dimension.

[0075] The following example uses a model detection processing method provided in this embodiment to illustrate its application in a dialogue generation model scenario. Figure 4 The model detection processing method provided in this embodiment will be further explained below. See [link to documentation]. Figure 4 The model detection and processing method applied to dialogue generation model scenarios includes the following steps.

[0076] Step S402: Extract the question data of the dialogue generation model from the corpus.

[0077] Step S404: Input the problem corpus into the adapted adversarial model to generate adversarial problems.

[0078] Step S406: Write the adversarial question into the adversarial template of the dialogue generation model to generate the question text.

[0079] Step S408: Input the question text into the dialogue generation model for dialogue generation processing to obtain dialogue generation data.

[0080] Step S410: Input the dialogue generation data into the data detection model library and perform data detection on at least one detection model under the data security dimension.

[0081] Step S412: Based on the data detection results output by the data detection model, calculate the data security index of the dialogue generation model in at least one detection category.

[0082] Step S414: Summarize the data security indicators of at least one detection category to obtain the summary result.

[0083] Step S416: Generate an adjustment strategy for the dialogue generation model based on the summarized results.

[0084] The following is an embodiment of a model detection and processing device provided in this specification:

[0085] In the above embodiments, a model detection processing method is provided, and correspondingly, a model detection processing device is also provided, which will be described below with reference to the accompanying drawings.

[0086] Reference Figure 5 The diagram shows a schematic representation of an embodiment of a model detection processing device provided in this embodiment.

[0087] Since the apparatus embodiments correspond to the method embodiments, the descriptions are relatively simple. For relevant parts, please refer to the corresponding descriptions of the method embodiments provided above. The apparatus embodiments described below are merely illustrative.

[0088] This embodiment provides a model detection processing device, including:

[0089] The corpus extraction module 502 is configured to extract target corpus from the corpus according to the model type of the model to be detected;

[0090] The sample generation module 504 is configured to input the target corpus into an adapted adversarial model to generate adversarial samples and obtain adversarial samples.

[0091] The sample processing module 506 is configured to input the detection samples determined based on the adversarial samples into the model to be detected for processing, and obtain the sample processing results.

[0092] The detection index calculation module 508 is configured to calculate the detection index based on the sample processing results and under the detection dimension corresponding to the model type.

[0093] The following is an example of a model detection and processing device provided in this specification:

[0094] Corresponding to the model detection processing method described above, based on the same technical concept, one or more embodiments of this specification also provide a model detection processing device for executing the model detection processing method provided above. Figure 6 This is a schematic diagram of the structure of a model detection and processing device provided in one or more embodiments of this specification.

[0095] This embodiment provides a model detection and processing device, including:

[0096] like Figure 6 As shown, the model detection processing device can vary significantly due to differences in configuration or performance. It may include one or more processors 601 and a memory 602, where one or more application programs or data may be stored. The memory 602 may be temporary or persistent storage. The application programs stored in the memory 602 may include one or more modules (not shown), each module including a series of computer-executable instructions from the model detection processing device. Furthermore, the processor 601 may be configured to communicate with the memory 602 and execute the series of computer-executable instructions stored in the memory 602 on the model detection processing device. The model detection processing device may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more input / output interfaces 605, one or more keyboards 606, etc.

[0097] In one specific embodiment, the model detection processing apparatus includes a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the model detection processing apparatus, and is configured to be executed by one or more processors. The one or more programs include computer-executable instructions for performing the following:

[0098] Based on the model type of the model to be detected, extract the target corpus from the corpus;

[0099] The target corpus is input into an adapted adversarial model to generate adversarial samples, thereby obtaining adversarial samples;

[0100] The detection samples determined based on the adversarial samples are input into the model to be detected for processing to obtain the sample processing results;

[0101] Based on the sample processing results, the detection index is calculated under the detection dimension corresponding to the model type.

[0102] This specification provides an example of a storage medium as follows:

[0103] Corresponding to the model detection processing method described above, based on the same technical concept, one or more embodiments of this specification also provide a storage medium.

[0104] The storage medium provided in this embodiment is used to store computer-executable instructions, which, when executed by a processor, implement the following process:

[0105] Based on the model type of the model to be detected, extract the target corpus from the corpus;

[0106] The target corpus is input into an adapted adversarial model to generate adversarial samples, thereby obtaining adversarial samples;

[0107] The detection samples determined based on the adversarial samples are input into the model to be detected for processing to obtain the sample processing results;

[0108] Based on the sample processing results, the detection index is calculated under the detection dimension corresponding to the model type.

[0109] It should be noted that the embodiments of a storage medium described in this specification and the embodiments of a model detection processing method described in this specification are based on the same inventive concept. Therefore, the specific implementation of this embodiment can be referred to the implementation of the corresponding method described above, and the repeated parts will not be described again.

[0110] The various embodiments in this specification are described in a progressive manner. For the same or similar parts between the various embodiments, please refer to each other. Each embodiment focuses on describing the differences from other embodiments. For example, the device embodiment, equipment embodiment, and storage medium embodiment are all similar to the method embodiment, so the description is relatively simple. For reading the relevant content of the device embodiment, equipment embodiment, and storage medium embodiment, please refer to the description of the method embodiment.

[0111] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0112] In the 1930s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many improvements to the methodology today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that an improvement to the methodology cannot be implemented using a hardware physical module. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program a digital system themselves to "integrate" it onto a PLD, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0113] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0114] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0115] For ease of description, the above apparatus is described by dividing it into various functional units. Of course, when implementing the embodiments of this specification, the functions of each unit can be implemented in one or more software and / or hardware.

[0116] Those skilled in the art will understand that one or more embodiments of this specification can be provided as a method, system, or computer program product. Therefore, one or more embodiments of this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this specification may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0117] This specification is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this specification. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0118] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0119] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0120] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0121] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0122] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0123] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising at least one…" does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0124] One or more embodiments of this specification can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a particular task or implement a particular abstract data type. One or more embodiments of this specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0125] The above description is merely an embodiment of this document and is not intended to limit the scope of this document. Various modifications and variations can be made to this document by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this document should be included within the scope of the claims of this document.

Claims

1. A model detection processing method, comprising: Based on the model type of the model to be detected, extract the target corpus from the corpus; Based on the corpus information of the target corpus, adversarial models that are compatible with the model to be detected or the model type of the model to be detected are selected from the adversarial model library; The target corpus is input into the screened adversarial model to generate adversarial samples, thus obtaining adversarial samples. If the model type is a discriminative model, the adversarial sample is determined as the detection sample, and the detection sample is input into the model to be detected for classification processing to obtain the sample classification result, and the sample classification result is used as the sample processing result; If the model type is a generative model, the adversarial sample is written into the adversarial template of the model to be detected to generate a question, and the generated question data is used as the detection sample. The detection sample is input into the model to be detected for processing to obtain the sample processing result. Based on the sample processing results, the detection index is calculated under the detection dimension corresponding to the model type.

2. The model detection processing method according to claim 1, wherein the step of calculating the detection index under the detection dimension corresponding to the model type based on the sample processing result includes: Based on the sample classification results and the classification labels of the target corpus, determine the classification accuracy of the model to be detected for the detected samples; Based on the classification accuracy, a first adjustment strategy for the model to be detected under the robust dimension is generated.

3. The model detection processing method according to claim 1, further comprising, after the step of inputting the detection sample determined based on the adversarial example into the model to be detected for processing and obtaining the sample processing result, the method further comprises: Construct a detection set under an interpretable dimension based on the model to be detected, the interpretable model, and the detection samples; The detection set is input into an interpretable algorithm for at least one detection subclass under the interpretable dimension for interpretation evaluation, and the interpretability score of the model to be detected in the at least one detection subclass is obtained.

4. The model detection processing method according to claim 3 further includes: Based on the interpretability score of the at least one detection subclass, a second adjustment strategy for the model to be detected under the interpretability dimension is generated.

5. The model detection processing method according to claim 1, wherein inputting the detection sample determined based on the adversarial example into the model to be detected for processing to obtain the sample processing result includes: The problem data is input into the model to be detected to generate the adversarial template corresponding to the problem, and the generated data is used as the sample processing result.

6. The model detection processing method according to claim 5, wherein the step of calculating the detection index under the detection dimension corresponding to the model type based on the sample processing result includes: The generated data is input into at least one data detection model in the data security dimension of the detection model library for data detection. Based on the data detection results output by the data detection model, calculate the data security index of the model to be detected in at least one detection category.

7. The model detection processing method according to claim 6 further includes: A summary result is obtained by summarizing the data security indicators of at least one detection category under the aforementioned data security dimension. A third adjustment strategy for the model to be detected is generated based on the summarized results.

8. A model detection processing device, comprising: The corpus extraction module is configured to extract target corpus from the corpus based on the model type of the model to be detected. The sample generation module is configured to, based on the corpus information of the target corpus, select adversarial models from the adversarial model library that are compatible with the model to be detected or the model type of the model to be detected; input the target corpus into the selected adversarial models to generate adversarial samples, thereby obtaining adversarial samples; The sample processing module is configured to, if the model type is a discriminative model, identify the adversarial sample as the detection sample, input the detection sample into the model to be detected for classification processing, obtain the sample classification result, and use the sample classification result as the sample processing result; If the model type is a generative model, the adversarial sample is written into the adversarial template of the model to be detected to generate a question, and the generated question data is used as the detection sample. The detection sample is input into the model to be detected for processing to obtain the sample processing result. The detection index calculation module is configured to calculate the detection index based on the sample processing results and under the detection dimension corresponding to the model type.

9. A model detection and processing device, comprising: processor; And, a memory configured to store computer-executable instructions, which, when executed, cause the processor to: Based on the model type of the model to be detected, extract the target corpus from the corpus; Based on the corpus information of the target corpus, adversarial models that are compatible with the model to be detected or the model type of the model to be detected are selected from the adversarial model library; The target corpus is input into the screened adversarial model to generate adversarial samples, thus obtaining adversarial samples. If the model type is a discriminative model, the adversarial sample is determined as the detection sample, and the detection sample is input into the model to be detected for classification processing to obtain the sample classification result, and the sample classification result is used as the sample processing result; If the model type is a generative model, the adversarial sample is written into the adversarial template of the model to be detected to generate a question, and the generated question data is used as the detection sample. The detection sample is input into the model to be detected for processing to obtain the sample processing result. Based on the sample processing results, the detection index is calculated under the detection dimension corresponding to the model type.

10. A storage medium for storing computer-executable instructions, which, when executed by a processor, perform the following process: Based on the model type of the model to be detected, extract the target corpus from the corpus; Based on the corpus information of the target corpus, adversarial models that are compatible with the model to be detected or the model type of the model to be detected are selected from the adversarial model library; The target corpus is input into the screened adversarial model to generate adversarial samples, thus obtaining adversarial samples. If the model type is a discriminative model, the adversarial sample is determined as the detection sample, and the detection sample is input into the model to be detected for classification processing to obtain the sample classification result, and the sample classification result is used as the sample processing result; If the model type is a generative model, the adversarial sample is written into the adversarial template of the model to be detected to generate a question, and the generated question data is used as the detection sample. The detection sample is input into the model to be detected for processing to obtain the sample processing result. Based on the sample processing results, the detection index is calculated under the detection dimension corresponding to the model type.