Model theft detection methods and related equipment
By combining feature enhancement and binary classifiers, the problem of limited applicability and easy bypass of existing model theft detection methods is solved, and model theft can be accurately identified in multiple scenarios without affecting the model's prediction performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2024-03-05
- Publication Date
- 2026-06-30
AI Technical Summary
Existing model-based theft detection methods have limited applicability, are prone to false alarms, and are easily bypassed by adaptive attacks.
By acquiring training images corresponding to the pre-trained target model, using Laplace transform for feature enhancement, and combining a binary classifier with the predicted vector of the feature-enhanced image and the gradient analysis of the model weights, model theft recognition in various theft scenarios can be achieved.
It achieves accurate identification in various theft scenarios without affecting the prediction performance of the target model, avoiding false alarms and adaptive attack bypass.
Smart Images

Figure CN118349843B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method and related equipment for identifying model theft. Background Technology
[0002] Model theft attacks, also known as model extraction attacks, aim to acquire knowledge about a target model, including weight parameters, private data, and model structure. Using the knowledge gained through model theft attacks, attackers can construct a model with similar performance to the target model, or further solve for the model's internal parameters, such as decrypting the model or leaking private information, thus posing a significant threat to the target model.
[0003] Based on the above, existing model theft identification methods have limited applicability, are prone to false alarms, and are easily bypassed by adaptive attacks. Summary of the Invention
[0004] In view of this, the purpose of this application is to propose a model theft identification method and related equipment to solve the above-mentioned technical problems.
[0005] To achieve the above objectives, the first aspect of this application provides a method for identifying model theft, comprising:
[0006] Acquire training images corresponding to a pre-trained target model, and a model to be identified corresponding to the target model;
[0007] Select a target training image from the training images according to a preset conversion rate;
[0008] The target training image is converted into a grayscale image based on preset channel pixel values;
[0009] The grayscale image is enhanced by performing a Laplace transform to obtain the enhanced image.
[0010] The feature-enhanced image is input into the model to be identified, the model to be identified is used to output a first prediction vector corresponding to the model to be identified, and the first model weight gradient corresponding to the model to be identified and the feature-enhanced image is obtained.
[0011] The first predicted vector is input into a first model in a pre-trained binary classifier corresponding to the first theft scenario. The first model obtains a first classification result, which is then output by the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model in the binary classifier corresponding to the second theft scenario. The second model obtains a second classification result, which is then output by the binary classifier. The first classification result indicates whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result indicates whether the model to be identified was stolen from the target model in the second theft scenario.
[0012] Optionally, the training process of the target model includes:
[0013] Obtain the first real label corresponding to the training image and construct the first pre-trained model;
[0014] The feature-enhanced image and other training images in the training images other than the target training image are input into the first pre-trained model, and the first prediction result is output by the first pre-trained model.
[0015] A first loss function is constructed based on the difference between the first true label and the first prediction result. The first loss function is minimized. The first pre-trained model is trained and adjusted based on the result of the processing of the first loss function during the minimization process to obtain a trained first pre-trained model. The features corresponding to the feature-enhanced image are learned into the trained first pre-trained model. The trained first pre-trained model is used as the target model.
[0016] Optionally, the training process of the binary classifier includes:
[0017] Construct the first initial model;
[0018] The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The target model outputs a corresponding second prediction vector and marks the second prediction vector with a preset first identifier. At the same time, the pre-trained comparison model outputs a corresponding third prediction vector and marks the third prediction vector with a preset second identifier.
[0019] The second prediction vector labeled with the first identifier and the third prediction vector labeled with the second identifier are input into the first initial model, and the first initial model outputs the first prediction classification result.
[0020] Based on the first predicted classification result, a second loss function is constructed by the difference between the first identifier and the second identifier. The second loss function is minimized according to the result of the minimization process. The first initial model is trained and adjusted to obtain the first model. The first model is then added to the binary classifier.
[0021] Optionally, the training process of the comparison model includes:
[0022] Obtain the second real label corresponding to the training image, and construct the second pre-trained model;
[0023] The training image is input into the second pre-trained model, and the second pre-trained model outputs a second prediction result.
[0024] A third loss function is constructed based on the difference between the second true label and the second prediction result. The third loss function is minimized. The second pre-trained model is trained and adjusted based on the result of the minimization process of the third loss function to obtain a trained second pre-trained model. The trained second pre-trained model is used as the comparison model.
[0025] Optionally, the training process of the binary classifier includes:
[0026] Construct a second initial model;
[0027] The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The second model weight gradient of the target model with respect to the feature-enhanced image is obtained, and the second model weight gradient is marked with a preset first identifier. At the same time, the third model weight gradient of the pre-trained comparison model with respect to the feature-enhanced image is obtained, and the third model weight gradient is marked with a preset second identifier.
[0028] The second model weight gradient marked with the first identifier and the third model weight gradient marked with the second identifier are input into the second initial model, and the second initial model outputs the second prediction classification result.
[0029] Based on the second predicted classification result, a fourth loss function is constructed by the difference between the second and the first or second identifier. The fourth loss function is minimized. The second initial model is trained and adjusted based on the result of the fourth loss function during the minimization process to obtain the second model. The second model is then added to the binary classifier.
[0030] Optionally, the method further includes:
[0031] If the first classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the first theft scenario; or,
[0032] If the first classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the first theft scenario; or,
[0033] If the second classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the second theft scenario; or,
[0034] If the second classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the second theft scenario.
[0035] Optionally, the method further includes inputting the first predicted vector into a pre-trained binary classifier corresponding to the first theft scenario, obtaining a first classification result through the first model, and outputting the first classification result through the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model in the binary classifier corresponding to the second theft scenario, obtaining a second classification result through the second model, and outputting the second classification result through the binary classifier.
[0036] Determine the first probability that the first classification result or the second classification result is a preset first identifier;
[0037] Determine the second probability that the first classification result or the second classification result is a preset second identifier;
[0038] If the first probability is greater than the second probability, then the model to be identified is determined to have been stolen from the target model.
[0039] Based on the same inventive concept, a second aspect of this application provides a model theft identification device, comprising:
[0040] The acquisition module is configured to acquire a training image corresponding to a pre-trained target model, and a model to be identified corresponding to the target model;
[0041] The selected module is configured to select a target training image from the training images according to a preset conversion rate;
[0042] The conversion module is configured to convert the target training image into a grayscale image based on preset channel pixel values;
[0043] The feature enhancement module is configured to perform feature enhancement processing on the grayscale image using Laplace transform to obtain the feature-enhanced image;
[0044] The prediction module is configured to input the feature-enhanced image into the model to be identified, use the model to be identified to output a first prediction vector corresponding to the model to be identified, and obtain the first model weight gradient corresponding to the model to be identified and the feature-enhanced image.
[0045] The classification module is configured to input the first predicted vector into a first model corresponding to the first theft scenario in a pre-trained binary classifier, obtain a first classification result through the first model, and output the first classification result through the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model corresponding to the second theft scenario in the binary classifier, obtain a second classification result through the second model, and output the second classification result through the binary classifier. The first classification result is used to indicate whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result is used to indicate whether the model to be identified was stolen from the target model in the second theft scenario.
[0046] Based on the same inventive concept, a third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor, when executing the computer program, implements the method described in the first aspect above.
[0047] Based on the same inventive concept, a fourth aspect of this application provides a non-transitory computer-readable storage medium that stores computer instructions for causing a computer to perform the method described in the first aspect above.
[0048] As can be seen from the above, the model theft identification method and related equipment provided in this application input the feature-enhanced image into the model to be identified, use the model to be identified to output the first prediction vector corresponding to the model to be identified, and obtain the first model weight gradient corresponding to the model to be identified and the feature-enhanced image. By using the feature-enhanced image, there is no need to modify the prediction vector and model structure of the target model, so it will not affect the prediction effect of the target model and ensure the user's experience with the target model. In addition, by inputting the first prediction vector into the first model corresponding to the first theft scenario in the pre-trained binary classifier, the first classification result is obtained through the first model. At the same time, the first model weight gradient is input into the second model corresponding to the second theft scenario in the binary classification model, and the second classification result is obtained through the second model. This achieves model theft identification that is applicable to multiple theft scenarios at the same time, avoiding problems such as false alarms and being bypassed by adaptive attacks. Attached Figure Description
[0049] To more clearly illustrate the technical solutions in this application or related technologies, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0050] Figure 1 This is a flowchart of the model theft identification method according to an embodiment of this application;
[0051] Figure 2 This is a schematic diagram of the model theft identification process framework in an embodiment of this application;
[0052] Figure 3 This is a structural block diagram of the model theft identification device according to an embodiment of this application;
[0053] Figure 4 This is a schematic diagram of an electronic device according to an embodiment of this application. Detailed Implementation
[0054] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.
[0055] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this application should have the ordinary meaning understood by one of ordinary skill in the art to which this application pertains. The terms "first," "second," and similar terms used in the embodiments of this application do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are only used to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.
[0056] It is understood that before using the technical solutions of the various embodiments in this application, users will be informed of the type, scope of use, and usage scenarios of the personal information involved in an appropriate manner, and user authorization will be obtained.
[0057] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose, based on the prompt message, whether to provide personal information to the software or hardware such as electronic devices, applications, servers, or storage media performing the operations described in this application.
[0058] As an optional but not limited implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.
[0059] It is understood that the above notification and user authorization process is merely illustrative and does not limit the implementation of this application. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this application.
[0060] The embodiments of this application will be described in detail below with reference to the accompanying drawings.
[0061] Model theft attacks, also known as model extraction attacks, aim to acquire knowledge about a target model, including weight parameters, private data, and model structure. Using this knowledge, attackers can construct a model with similar performance to the target model or further solve for its internal parameters, such as decrypting the model or leaking private information. Model theft attacks can be implemented in various ways, including white-box and black-box attacks. In black-box attacks, attackers can only use the target model's inputs and outputs to infer its internal structure and parameters. A common example is Machine Learning as a Service (MLaaS)-based model theft attacks, where attackers can only query the model through an Application Programming Interface (API). By continuously querying the model and using the returned data, attackers train a model highly similar to the target model. The significance of model theft lies in the fact that training a model from scratch is extremely costly, while stealing a pre-trained model is significantly less costly. In white-box attacks, attackers can obtain complete information about the target model, including its structure and parameters. Therefore, white-box model theft attacks are generally more effective than black-box attacks. By stealing the model, attackers can obtain a copy of the target model, which they can then use for profit or to generate adversarial examples for further attacks. For models with high training costs, the harm caused by model theft is enormous.
[0062] Existing defenses against model theft have multiple different defense angles to suit different attack scenarios. Defense is mainly carried out from two angles, and different defense methods exist depending on the scenario and defense angle, including the following two defense angles: (1) active defense (2) passive defense
[0063] (1) Active defense
[0064] The purpose of proactive defense is to mitigate anticipated attacks, that is, to affect the attacker's normal attack behavior or prevent the attacker from stealing the complete model assets, rendering the stolen model generated by the attacker unable to provide normal functionality and unusable, thus preventing the attack or at least reducing its effectiveness. It primarily targets the effectiveness of attacks; that is, they do not prevent the attacker from obtaining the model, but rather aim to make the stolen model too low in quality to be usable.
[0065] (2) Passive defense
[0066] The purpose of passive defense is to detect attacks (in progress or in the past). Passive defense does not prevent the model from being stolen, but rather notifies the model owner of the event or tells the model owner that the third-party model was stolen from their model. Compared to active defense, passive defense does not affect the occurrence of theft attacks, but rather defends against them from the perspective of proving that an attack is happening or has already happened.
[0067] Existing defenses against model theft, taking proactive defense as an example, include perturbations of model output, which can negatively impact the user experience. Proactive defenses based on model modification are only effective against attacks that steal model architecture, not model parameters. Passive defenses are vulnerable to adaptive attacks that bypass model watermarks, preventing the triggering of specific backdoors. Furthermore, passive dataset inference (DI) defenses rely on features from the attacked dataset for ownership verification; however, if the legitimate model is trained on a shared public dataset, false positives are likely, and the model is susceptible to post-processing attacks that weaken dataset features, rendering verification impossible. Additionally, while passive ownership verification based on external feature embedding offers a new approach, this method, due to its external feature embedding, affects the original model's prediction vector and limits its applicability to scenarios requiring full access to the model being verified.
[0068] As can be seen from the current status and shortcomings of the above model theft schemes, the existing model theft defense has gradually shifted from active defense schemes that actively disturb or modify the model to defense schemes based on proof of ownership. Compared with actively disturbing and modifying the model, this scheme will not affect the normal user experience and can prove the ownership of a model to be verified, which is something that active defense schemes cannot do. However, the existing proof of ownership-based defense schemes have problems such as insufficient coverage of use cases, susceptibility to false positives, and easy bypass by adaptive attacks.
[0069] The embodiments of this application provide a model theft identification method. A feature-enhanced image is input into the model to be identified. The model outputs a first prediction vector corresponding to the model and obtains the first model weight gradient corresponding to the feature-enhanced image. By using the feature-enhanced image, no modification to the prediction vector and model structure of the target model is required, thus not affecting the prediction performance of the target model and ensuring the user experience. Furthermore, the first prediction vector is input into a pre-trained binary classifier corresponding to a first theft scenario, resulting in a first classification result. Simultaneously, the first model weight gradient is input into a binary classification model corresponding to a second theft scenario, resulting in a second classification result. This achieves model theft identification applicable to multiple theft scenarios, avoiding false alarms and being bypassed by adaptive attacks.
[0070] like Figure 1 As shown, the method in this embodiment includes:
[0071] Step 101: Obtain the training image corresponding to the pre-trained target model, and the model to be identified corresponding to the target model.
[0072] In this step, the target model represents the stolen model, and the model to be identified represents the model that may be stolen from the target model.
[0073] If the model to be identified is stolen from the target model, the model to be identified will retain some of the target model's private features. By extracting or identifying these private features and then detecting whether the model to be identified has these features, it can be proven whether the model to be identified is stolen from the target model. Therefore, it is necessary to obtain training images corresponding to the pre-trained target model.
[0074] Step 102: Select a target training image from the training images according to a preset conversion rate.
[0075] In this step, the proportion of training images randomly selected by the defender is called the conversion rate. The preset conversion rate can be set according to specific circumstances, and no specific limitation is made here.
[0076] Select a specified proportion (i.e., a preset conversion rate) of training images from the training images as target training images.
[0077] For example, the preset conversion rate is λ%, and the training images are... The target training image S is obtained by multiplying the result with the preset conversion rate λ%. λ .
[0078] Step 103: Convert the target training image into a grayscale image based on preset channel pixel values.
[0079] In this step, the target training image is converted into a grayscale image using the feature enhancer E based on preset channel pixel values, as shown below:
[0080] Y = 0.299 × R + 0.587 × G + 0.114 × B
[0081] In this context, R, G, and B represent the pixel values of the red, green, and blue channels of a color image, respectively, while Y represents the pixel values of a grayscale image.
[0082] Step 104: Perform feature enhancement processing on the grayscale image using Laplace transform to obtain the feature-enhanced image.
[0083] In this step, the Laplacian operator is used for image filtering. Laplacian is a second-order differential calculation. By applying the Laplacian filter to the original grayscale image, edges and details in the grayscale image are enhanced, resulting in image sharpening, enhanced texture and shape features, and a feature-enhanced image. For digital images, the Laplacian operator can be simplified to:
[0084] g(i,j)=4f(i,j)-f(1-1,j)-f(i,j+1)-f(i,j-1).
[0085] Neural networks learn the texture and edge features of images. By enhancing the texture and edge features of images, the features of the dataset learned by the deep learning model can be strengthened, thereby achieving the effect of feature enhancement.
[0086] Step 105: Input the feature-enhanced image into the model to be identified, use the model to be identified to output a first prediction vector corresponding to the model to be identified, and obtain the first model weight gradient corresponding to the model to be identified and the feature-enhanced image.
[0087] In this step, since the embedded external features do not have an explicit representation, and compared to model theft defense methods based on model watermarking and model backdoors, the embedded features do not affect the model's classification results. For the trained model, the embedded features only result in slight differences in parameters and output prediction vectors compared to the model without embedded features. Therefore, an additional binary classifier is needed to confirm whether the model to be identified has embedded features corresponding to the feature-enhanced image for different scenarios.
[0088] Since in a black-box scenario, the defender can only obtain the output vector of the model to be identified for verification, the feature-enhanced image is used as a query to obtain the first predicted vector corresponding to the model to be identified, so as to use the first predicted vector as the input of the binary classifier.
[0089] For ownership verification in a white-box scenario, the defender has the right to access the model architecture and parameters of the model to be identified. Compared to the black-box scenario where only the model's prediction vector can be obtained, the defender can obtain more information about the model to be identified. In this case, the first model weight gradient corresponding to the model to be identified and the feature-enhanced image is selected as the input for binary classification.
[0090] Existing proactive defense-based model theft attack defense schemes include data perturbation and model modification. Data perturbation prevents query-based model theft attacks by adding extra perturbations to the model's prediction vectors, while model modification prevents the model's structure from being stolen by actively modifying the model structure. However, the data perturbation defense scheme will affect the model's normal prediction results and the user experience of normal users, while the model modification will also modify the model architecture, thus affecting the model's prediction accuracy.
[0091] This application uses a model theft defense method based on ownership verification. It does not modify the model's prediction vector or model structure, but only enhances the image through features. It does not change the dataset label, so it does not affect the model's prediction results. It has no impact on the user experience of normal users and is transparent to users. It only affects attackers.
[0092] Step 106: Input the first predicted vector into the first model corresponding to the first theft scenario in the pre-trained binary classifier, obtain the first classification result through the first model, and output the first classification result through the binary classifier. At the same time, input the weight gradient of the first model into the second model corresponding to the second theft scenario in the binary classifier, obtain the second classification result through the second model, and output the second classification result through the binary classifier. The first classification result is used to indicate whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result is used to indicate whether the model to be identified was stolen from the target model in the second theft scenario.
[0093] In this step, existing defense solutions that verify ownership based on model features either only consider the verification of suspicious models in white-box scenarios, i.e., the defender needs to obtain all access permissions to the model, without considering black-box scenarios, or defense solutions that support black-box scenarios have certain false alarm problems and are not very effective.
[0094] A binary classifier is used to perform a final ownership verification step on the model to be identified. The binary classifier has different verification procedures for black-box scenarios (i.e., the first theft scenario) and white-box scenarios (i.e., the second theft scenario).
[0095] (1) Ownership verification in black-box scenarios
[0096] In a black-box scenario for ownership verification, the defender (e.g., a legitimate owner with access to the target model) only has query permissions for the model M to be identified, and can only obtain the predicted vector M(x) of the model to be identified. This is then processed by a pre-trained binary classifier C for the black-box scenario. B (i.e., the first model) and the feature-enhanced dataset S E Extract m feature-enhanced images x from them E Input the model to be identified, and then use C... B (M(x E The result of M(x) is used to determine whether the model to be identified was stolen from the target model, where M(x) E ) represents the first prediction vector, C B (M(x E )) indicates the first classification result.
[0097] (2) Ownership verification in a white-box scenario: In this case, the defender has white-box access rights to the model M to be identified. The trained binary classifier C for white-box scenarios is then used. W and feature augmentation dataset S E Extract m feature-enhanced images x from them E Input the model to be identified, and then use C... W (g M (x E The result of the test model is used to determine whether the model to be identified was stolen from the target model. Here, g M (x E ) is represented as:
[0098]
[0099] Among them, g M (x E This refers to the feature-enhanced image x. E The sign of the weight gradient vector of the model to be identified is represented as the gradient of the first model weights; C W (g M (x E )) indicates the second classification result.
[0100] Existing model theft defense schemes based on ownership verification include dataset inference and external feature embedding techniques. Dataset inference uses inherent features of the model dataset for ownership verification, but it only extracts these inherent features without further processing. The core idea of dataset inference is that the target model verifies ownership by identifying whether the target model involves knowledge learned from the victim model from its private training dataset. While it supports both black-box and white-box verification scenarios, it is vulnerable to adaptive attacks and also suffers from false positives. To address this issue, existing techniques propose actively embedding external features into the model as its own features for verification. Although this solves the bypass and false positive problems of dataset inference, it only supports ownership verification in white-box scenarios and not in black-box scenarios due to the embedding of external features.
[0101] To address the issue that existing model theft defense schemes based on ownership verification cannot achieve good defense performance while fully covering both black-box and white-box verification scenarios, this application proposes and designs a novel defense framework. It considers defense scheme design for both black-box and white-box verification scenarios and, based on the idea of external feature embedding technology, innovates a scheme to enhance the model's own dataset features. This framework can cover all ownership verification scenarios and has a high verification accuracy. Compared to dataset inference, it avoids the risks of false positives and being bypassed by adaptive attacks. While ensuring comprehensive scenario coverage, it also achieves good ownership verification performance.
[0102] The above scheme inputs the feature-enhanced image into the model to be identified, uses the output of the model to be identified to produce a first prediction vector corresponding to the model, and obtains the first model weight gradient corresponding to the model to be identified and the feature-enhanced image. By using the feature-enhanced image, there is no need to modify the prediction vector and model structure of the target model, so it will not affect the prediction effect of the target model and ensure the user's experience with the target model. In addition, by inputting the first prediction vector into the first model corresponding to the first theft scenario in the pre-trained binary classifier, the first classification result is obtained through the first model. At the same time, the first model weight gradient is input into the second model corresponding to the second theft scenario in the binary classification model, and the second classification result is obtained through the second model. This achieves model theft identification that is applicable to multiple theft scenarios, avoiding false alarms and being bypassed by adaptive attacks.
[0103] In some embodiments, the training process of the target model includes:
[0104] Step A1: Obtain the first real label corresponding to the training image and construct the first pre-trained model.
[0105] Step A2: Input the feature-enhanced image and other training images (excluding the target training image) into the first pre-trained model, and output the first prediction result through the first pre-trained model.
[0106] Step A3: Construct a first loss function based on the difference between the first true label and the first prediction result, minimize the first loss function, and train and adjust the first pre-trained model based on the result of the minimization process to obtain a trained first pre-trained model. Learn the features corresponding to the feature-enhanced image into the trained first pre-trained model, and use the trained first pre-trained model as the target model.
[0107] In the above scheme, feature-enhanced image S is used. E Other training images besides the target training image are used as training data S. T To train the first pre-trained model, the feature-enhanced image S E Enhanced private feature embedding target model V θ These enhanced features are learned by the first pre-trained model during training using the following formula:
[0108]
[0109] in, It is the first loss function.
[0110] In some embodiments, the training process of the binary classifier includes:
[0111] Step B1: Construct the first initial model.
[0112] Step B2: Input the feature-enhanced image into the target model and the pre-trained comparison model respectively. Use the target model to output the corresponding second prediction vector and mark the second prediction vector with a preset first identifier. At the same time, use the pre-trained comparison model to output the corresponding third prediction vector and mark the third prediction vector with a preset second identifier.
[0113] Step B3: Input the second prediction vector labeled with the first identifier and the third prediction vector labeled with the second identifier into the first initial model, and output the first prediction classification result through the first initial model.
[0114] Step B4: Based on the first predicted classification result, construct a second loss function with the difference between the first identifier and the second identifier. Minimize the second loss function according to the result of the minimization process. Train and adjust the first initial model to obtain the first model. Add the first model to the binary classifier.
[0115] In the above scheme, for ownership verification in a black-box scenario (i.e., the first theft scenario), the defender can only obtain the output vector of the model to be identified for verification. Therefore, the data-augmented dataset S is used. E (i.e., feature-enhanced image) Training target model V θ The binary classifier C is trained using the prediction vectors of a benign model (i.e., a contrastive model) B trained on the original private dataset S (i.e., training images). B (i.e., the first model).
[0116]
[0117] The defender augments the dataset S. E The (i.e., feature-enhanced image) is used as the query and input into both the target model and the pre-trained comparison model. The second predicted vector corresponding to the output of the target model and the third predicted vector corresponding to the output of the pre-trained comparison model are obtained. The second predicted vector from the target model is marked as +1 (i.e., the first identifier), and the third predicted vector from the pre-trained comparison model is marked as -1 (i.e., the second identifier). This is used as the binary classifier C. B Dataset S B , can be represented as:
[0118]
[0119] Where V θ (x E ) is the target model for dataset S E The prediction vector (i.e., the second prediction vector), B(x) E ) is the comparison model for dataset S E The predicted vector (i.e., the third predicted vector), the second predicted vector labeled with the first identifier, and the third predicted vector labeled with the second identifier are used as the dataset to train the first model. At this time, the features learned by the first model (i.e., the second loss function) are as follows:
[0120]
[0121] Intuitively speaking, because the comparison model was not seen during training of the data-augmented dataset S... E Sample, then for SE The probabilities of each class in the prediction vector are relatively even, while the training set of the target model includes S. E Sample, then for S E The highest-order term in the prediction vector has a higher probability.
[0122] In some embodiments, the training process of the comparison model includes:
[0123] Step C1: Obtain the second real label corresponding to the training image and construct the second pre-trained model.
[0124] Step C2: Input the training image into the second pre-trained model, and output the second prediction result through the second pre-trained model.
[0125] Step C3: Construct a third loss function based on the difference between the second true label and the second prediction result, minimize the third loss function, and train and adjust the second pre-trained model according to the result of the minimization process of the third loss function to obtain a trained second pre-trained model, and use the trained second pre-trained model as the comparison model.
[0126] In the above scheme, to ensure the accuracy of the prediction results, a model without theft attacks needs to be trained as a benign model (i.e., the contrast model) B. This model is trained using the original dataset S without feature augmentation. It does not learn the augmented features corresponding to the augmented images, but only learns the features of the original dataset. The benign model learns the features of the original dataset S (i.e., the training images) through the following formula (i.e., the third loss function):
[0127]
[0128] Intuitively, converting a small number of samples into sharpened grayscale images does not change the image's original label. This can be considered a form of feature enhancement, preserving the original label and having no impact on the normal function of the target model. Furthermore, based on the characteristics of neural networks learning the texture and edge features of images, the enhanced features are embedded into the target model V. θ Meanwhile, a benign model B is trained using the original dataset S without data augmentation. The benign model learns the features of the original dataset S and does not have the augmented features learned by the target model. As a control group of the benign model that is not stolen, it participates in the training of the binary classifier.
[0129] In some embodiments, the training process of the binary classifier includes:
[0130] Step D1: Construct the second initial model.
[0131] Step D2: Input the feature-enhanced image into the target model and the pre-trained comparison model respectively, obtain the second model weight gradient of the target model with respect to the feature-enhanced image, and mark the second model weight gradient with a preset first identifier. At the same time, obtain the third model weight gradient of the pre-trained comparison model with respect to the feature-enhanced image, and mark the third model weight gradient with a preset second identifier.
[0132] Step D3: Input the second model weight gradient marked with the first identifier and the third model weight gradient marked with the second identifier into the second initial model, and output the second prediction classification result through the second initial model.
[0133] Step D4: Based on the second predicted classification result, construct a fourth loss function with the difference between the first identifier and the second identifier, minimize the fourth loss function, train and adjust the second initial model according to the result of the fourth loss function during the minimization process, obtain the second model, and add the second model to the binary classifier.
[0134] In the above scheme, for ownership verification in the white-box scenario (i.e., the second theft scenario), the defender has the right to access the model architecture and parameters of the model to be identified. Compared to the black-box scenario where only the model's prediction vector can be obtained, more information about the model to be identified can be obtained. In this case, the gradient vector of the model weights is selected as a feature to train the binary classifier C. W (i.e., the second model).
[0135]
[0136] The defender will enhance dataset S E (i.e., the feature-enhanced image) is input into the target model and the contrast model respectively, and then the target model and the pre-trained contrast model are obtained for the enhancement dataset S. E The model weight gradients are used to classify the target model's weight gradient (i.e., the second model weight gradient) as +1 (i.e., the first label) and the comparison model's weight gradient (i.e., the third model weight gradient) as -1 (i.e., the second label). This is used as the binary classifier C. W Dataset S W , can be represented as:
[0137]
[0138] Let sign(.) be the sign function, then g V (x E ), g B (xE The signs of the gradient vectors of the model weights of the target model and the comparison model are respectively:
[0139]
[0140]
[0141] Here, a sign vector is used to represent the gradient of the model itself. That is, the gradient vector of the model weights is processed twice using the sign(.) function, which can highlight the influence of the gradient direction.
[0142] Use this as the dataset to train a binary classifier C W The features learned by the binary classifier are represented by the following formula:
[0143]
[0144] Compared to the black-box scenario (i.e., the first theft scenario), the ownership validator trained in the white-box scenario uses the model's gradient vector instead of the model's prediction vector. The gradient vector contains more feature information learned by the model.
[0145] In some embodiments, the method further includes:
[0146] Step E1: In response to the first classification result being a preset first identifier, the model to be identified is determined to have been stolen from the target model in the first theft scenario. Alternatively,
[0147] Step E2: In response to the first classification result being a preset second identifier, it is determined that the model to be identified was not stolen from the target model in the first theft scenario. Alternatively,
[0148] Step E3: In response to the second classification result being a preset first identifier, it is determined that the model to be identified was stolen from the target model in the second theft scenario. Alternatively,
[0149] Step E4: In response to the second classification result being a preset second identifier, it is determined that the model to be identified was not stolen from the target model in the second theft scenario.
[0150] In the above scheme, the first classification result is used to determine whether the model to be identified was stolen from the target model in the first theft scenario.
[0151] The second classification result is used to determine whether the model to be identified was stolen from the target model in the second theft scenario.
[0152] The first and second identifiers can be set according to specific circumstances, and no specific limitations are imposed here. For example, if the first classification result is 1, it is determined that the model to be identified was stolen from the target model in the first theft scenario, or if the second classification result is 1, it is determined that the model to be identified was stolen from the target model in the second theft scenario.
[0153] Using the first and second identifiers, it can be quickly determined whether the model to be identified was stolen from the target model in the first or second theft scenario.
[0154] In some embodiments, after step 106, the method further includes:
[0155] Step F1: Determine the first probability that the first classification result or the second classification result is a preset first identifier.
[0156] Step F2: Determine the second probability that the first classification result or the second classification result is a preset second identifier.
[0157] Step F3: In response to the first probability being greater than the second probability, the model to be identified is determined to be stolen from the target model.
[0158] In the above scheme, due to the ownership verification scenarios in both black-box (i.e., the first theft scenario) and white-box (i.e., the second theft scenario) scenarios, for the feature-enhanced image x... E The selection of μ is random, which can affect the accuracy of ownership verification. Therefore, hypothesis testing is introduced to improve the accuracy of ownership verification. Let μ... S Let C be the posterior probability (i.e., the first probability) of the ownership classifier (i.e., the binary classifier) when the classification result of the model to be identified is true. B (M(x E )) = 1 or C W (g M (x E Let μ be the probability that ))=1. B To compare the posterior probability when the model's classification result is true, we have C. B (B(x E )) = 1 or C W (g B (x E Given the probability that μ = 1, we know that the contrast model, since it uses the original dataset S (i.e., training images) during training, should not be classified as a stealing model. B (That is, the second probability) should be as small as possible. Given the null hypothesis H0: μ S ≤μ B (H1: μ) S >μ BA model M to be identified can be considered to be derived from the target model V if and only if H0 is rejected. θ Stealing from there.
[0159] In the actual verification process in a white-box scenario, data is randomly selected from the feature enhancement dataset S. E Extract m different feature-enhanced images x E Input the model to be identified, perform paired sample t-tests, and calculate its p-value. H0 is rejected when the p-value is less than the significance level α. Additionally, the confidence score Δ is calculated. μ =μ S -μ B Δ represents the confidence level of the validation. μ The larger the value, the higher the credibility of the verification.
[0160] In some embodiments, the model theft detection process framework is as follows: Figure 2 As shown, the framework mainly includes three steps:
[0161] (1) Dataset feature enhancement
[0162] By enhancing the texture features S of the images in the model owner's private dataset S (i.e., the training images). E This allows the victim model (i.e., the target model) to learn more features from the private dataset. After an attacker performs a model theft attack, more features can be transferred to the attacker's model (i.e., the model to be identified), thus enabling better subsequent feature extraction and ownership verification. For image texture and edge feature enhancement, the image is sharpened using a Laplacian transform on the grayscale image to highlight the texture and edge features, converting a portion of the private dataset images into feature-enhanced images S. E (i.e., feature-enhanced image).
[0163] (2) Training the ownership validator
[0164] For the ownership validator training module, different feature extraction schemes are designed for white-box (i.e., the second theft scenario) and black-box (i.e., the first theft scenario) defense scenarios, respectively. The ownership validator is trained for specific scenarios. The extracted model owner features and the features of a benign model (i.e., the comparison model) are used. The model owner features are defined as private features, and the benign model features are defined as public features. They are combined into a dataset to train a binary classifier. The role of the binary classifier is to prove ownership based on the input features.
[0165] (3) Ownership verification
[0166] Based on the previously trained ownership validator and the victim's private dataset after feature enhancement, ownership verification is now performed on the suspicious model (i.e. the model to be identified). The verification scenarios are also divided into black-box and white-box verification scenarios. In the black-box scenario, the prediction vector is obtained by inputting the feature-enhanced dataset into the suspicious model, and this prediction vector is input into the ownership validator for verification. In the white-box scenario, the feature-enhanced dataset is also input into the suspicious model, and the gradient vector of the suspicious model is extracted and input into the ownership validator for verification. Finally, the accuracy of the prediction result is estimated by hypothesis testing, and the final prediction result is output.
[0167] In summary, this paper proposes a model theft attack defense framework from the perspective of ownership verification. It supports ownership proof in both white-box and black-box scenarios. Since it does not modify the model owner's model and only performs partial data augmentation on the dataset, it does not affect the normal use of the model or the prediction results. It supports ownership proof in both white-box and black-box scenarios, covering all ownership proof scenarios.
[0168] It should be noted that the method in this embodiment can be executed by a single device, such as a computer or server. The method can also be applied in a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method in this embodiment, and the multiple devices will interact with each other to complete the method described.
[0169] It should be noted that the above description describes some embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the above embodiments and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0170] Based on the same inventive concept, corresponding to any of the above embodiments, this application also provides a model theft identification device.
[0171] refer to Figure 3 The model theft identification device includes:
[0172] The acquisition module 301 is configured to acquire a training image corresponding to a pre-trained target model, and a model to be identified corresponding to the target model.
[0173] The selection module 302 is configured to select a target training image from the training images according to a preset conversion rate;
[0174] The conversion module 303 is configured to convert the target training image into a grayscale image based on preset channel pixel values;
[0175] The feature enhancement module 304 is configured to perform feature enhancement processing on the grayscale image using Laplace transform to obtain the feature-enhanced image;
[0176] The prediction module 305 is configured to input the feature-enhanced image into the model to be identified, use the model to be identified to output a first prediction vector corresponding to the model to be identified, and obtain the first model weight gradient corresponding to the model to be identified and the feature-enhanced image.
[0177] The classification module 306 is configured to input the first predicted vector into a first model in a pre-trained binary classifier corresponding to the first theft scenario, obtain a first classification result through the first model, and output the first classification result through the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model in the binary classifier corresponding to the second theft scenario, obtain a second classification result through the second model, and output the second classification result through the binary classifier. The first classification result is used to indicate whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result is used to indicate whether the model to be identified was stolen from the target model in the second theft scenario.
[0178] In some embodiments, the model theft identification device further includes a target model training module, which is specifically configured as follows:
[0179] Obtain the first real label corresponding to the training image and construct the first pre-trained model;
[0180] The feature-enhanced image and other training images in the training images other than the target training image are input into the first pre-trained model, and the first prediction result is output by the first pre-trained model.
[0181] A first loss function is constructed based on the difference between the first true label and the first prediction result. The first loss function is minimized. The first pre-trained model is trained and adjusted based on the result of the processing of the first loss function during the minimization process to obtain a trained first pre-trained model. The features corresponding to the feature-enhanced image are learned into the trained first pre-trained model. The trained first pre-trained model is used as the target model.
[0182] In some embodiments, the model theft detection device further includes a first training module for a binary classifier, wherein the first training module for the binary classifier is specifically configured as follows:
[0183] Construct the first initial model;
[0184] The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The target model outputs a corresponding second prediction vector and marks the second prediction vector with a preset first identifier. At the same time, the pre-trained comparison model outputs a corresponding third prediction vector and marks the third prediction vector with a preset second identifier.
[0185] The second prediction vector labeled with the first identifier and the third prediction vector labeled with the second identifier are input into the first initial model, and the first initial model outputs the first prediction classification result.
[0186] Based on the first predicted classification result, a second loss function is constructed by the difference between the first identifier and the second identifier. The second loss function is minimized according to the result of the minimization process. The first initial model is trained and adjusted to obtain the first model. The first model is then added to the binary classifier.
[0187] In some embodiments, the model theft identification device further includes a comparison model, which is specifically configured as follows:
[0188] Obtain the second real label corresponding to the training image, and construct the second pre-trained model;
[0189] The training image is input into the second pre-trained model, and the second pre-trained model outputs a second prediction result.
[0190] A third loss function is constructed based on the difference between the second true label and the second prediction result. The third loss function is minimized. The second pre-trained model is trained and adjusted based on the result of the minimization process of the third loss function to obtain a trained second pre-trained model. The trained second pre-trained model is used as the comparison model.
[0191] In some embodiments, the model theft detection device further includes a second training module for a binary classifier, wherein the second training module for the binary classifier is specifically configured as follows:
[0192] Construct a second initial model;
[0193] The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The second model weight gradient of the target model with respect to the feature-enhanced image is obtained, and the second model weight gradient is marked with a preset first identifier. At the same time, the third model weight gradient of the pre-trained comparison model with respect to the feature-enhanced image is obtained, and the third model weight gradient is marked with a preset second identifier.
[0194] The second model weight gradient marked with the first identifier and the third model weight gradient marked with the second identifier are input into the second initial model, and the second initial model outputs the second prediction classification result.
[0195] Based on the second predicted classification result, a fourth loss function is constructed by the difference between the second and the first or second identifier. The fourth loss function is minimized. The second initial model is trained and adjusted based on the result of the fourth loss function during the minimization process to obtain the second model. The second model is then added to the binary classifier.
[0196] In some embodiments, the model theft identification device further includes a determining module, which is specifically configured to:
[0197] If the first classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the first theft scenario; or,
[0198] If the first classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the first theft scenario; or,
[0199] If the second classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the second theft scenario; or,
[0200] If the second classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the second theft scenario.
[0201] In some embodiments, the model theft identification device further includes a probability determination module, which inputs the first predicted vector into a first model corresponding to the first theft scenario in a pre-trained binary classifier, obtains a first classification result through the first model, and outputs the first classification result through the binary classifier. Simultaneously, it inputs the weight gradient of the first model into a second model corresponding to the second theft scenario in the binary classifier, obtains a second classification result through the second model, and outputs the second classification result through the binary classifier. The probability determination module is specifically configured as follows:
[0202] Determine the first probability that the first classification result or the second classification result is a preset first identifier;
[0203] Determine the second probability that the first classification result or the second classification result is a preset second identifier;
[0204] If the first probability is greater than the second probability, then the model to be identified is determined to have been stolen from the target model.
[0205] For ease of description, the above devices are described in terms of function, divided into various modules. Of course, in implementing this application, the functions of each module can be implemented in one or more software and / or hardware.
[0206] The apparatus of the above embodiments is used to implement the corresponding model theft identification method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.
[0207] Based on the same inventive concept, corresponding to the methods of any of the above embodiments, this application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the model theft identification method described in any of the above embodiments.
[0208] Figure 4 This illustration shows a more specific hardware structure diagram of an electronic device provided in this embodiment. The device may include: a processor 401, a memory 402, an input / output interface 403, a communication interface 404, and a bus 405. The processor 401, memory 402, input / output interface 403, and communication interface 404 are interconnected internally via the bus 405.
[0209] The processor 401 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this specification.
[0210] The memory 402 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 402 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program code is stored in the memory 402 and is called and executed by the processor 401.
[0211] Input / output interface 403 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components in the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices may include displays, speakers, vibrators, indicator lights, etc.
[0212] Communication interface 404 is used to connect a communication module (not shown in the figure) to enable communication between this device and other devices. The communication module can communicate via wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0213] Bus 405 includes a pathway for transmitting information between various components of the device (e.g., processor 401, memory 402, input / output interface 403, and communication interface 404).
[0214] It should be noted that although the above-described device only shows the processor 401, memory 402, input / output interface 403, communication interface 404, and bus 405, in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the embodiments of this specification, and not necessarily all the components shown in the figures.
[0215] The electronic devices described above are used to implement the corresponding model theft identification methods in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.
[0216] Based on the same inventive concept, corresponding to the methods of any of the above embodiments, this application also provides a non-transitory computer-readable storage medium that stores computer instructions for causing the computer to execute the model theft identification method as described in any of the above embodiments.
[0217] The computer-readable medium of this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.
[0218] The computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the model theft identification method as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.
[0219] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of this application (including the claims) is limited to these examples; within the framework of this application, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of this application as described above, which are not provided in the details for the sake of brevity.
[0220] Additionally, to simplify the description and discussion, and to avoid obscuring the embodiments of this application, the well-known power / ground connections to integrated circuit (IC) chips and other components may or may not be shown in the provided drawings. Furthermore, the apparatus may be shown in block diagram form to avoid obscuring the embodiments of this application, and this also takes into account the fact that the details of the implementation of these block diagram apparatuses are highly dependent on the platform on which the embodiments of this application will be implemented (i.e., these details should be fully understood by those skilled in the art). While specific details (e.g., circuits) have been set forth to describe exemplary embodiments of this application, it will be apparent to those skilled in the art that the embodiments of this application can be implemented without these specific details or with variations thereof. Therefore, these descriptions should be considered illustrative rather than restrictive.
[0221] Although this application has been described in conjunction with specific embodiments thereof, many substitutions, modifications, and variations of these embodiments will be apparent to those skilled in the art from the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may be used with the embodiments discussed.
[0222] The embodiments of this application are intended to cover all such substitutions, modifications, and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the embodiments of this application should be included within the protection scope of this application.
Claims
1. A method for identifying model theft, characterized in that, include: Acquire training images corresponding to a pre-trained target model, and a model to be identified corresponding to the target model; Select a target training image from the training images according to a preset conversion rate; The target training image is converted into a grayscale image based on preset channel pixel values; The grayscale image is enhanced by performing a Laplace transform to obtain the enhanced image. The feature-enhanced image is input into the model to be identified, the model to be identified is used to output a first prediction vector corresponding to the model to be identified, and the first model weight gradient corresponding to the model to be identified and the feature-enhanced image is obtained. The first predicted vector is input into a first model in a pre-trained binary classifier corresponding to the first theft scenario. The first model obtains a first classification result, which is then output by the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model in the binary classifier corresponding to the second theft scenario. The second model obtains a second classification result, which is then output by the binary classifier. The first classification result indicates whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result indicates whether the model to be identified was stolen from the target model in the second theft scenario.
2. The method according to claim 1, characterized in that, The training process of the target model includes: Obtain the first real label corresponding to the training image and construct the first pre-trained model; The feature-enhanced image and other training images in the training images other than the target training image are input into the first pre-trained model, and the first prediction result is output by the first pre-trained model. A first loss function is constructed based on the difference between the first true label and the first prediction result. The first loss function is minimized. The first pre-trained model is trained and adjusted based on the result of the processing of the first loss function during the minimization process to obtain a trained first pre-trained model. The features corresponding to the feature-enhanced image are learned into the trained first pre-trained model. The trained first pre-trained model is used as the target model.
3. The method according to claim 1, characterized in that, The training process of the binary classifier includes: Construct the first initial model; The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The target model outputs a corresponding second prediction vector and marks the second prediction vector with a preset first identifier. At the same time, the pre-trained comparison model outputs a corresponding third prediction vector and marks the third prediction vector with a preset second identifier. The second prediction vector labeled with the first identifier and the third prediction vector labeled with the second identifier are input into the first initial model, and the first initial model outputs the first prediction classification result. Based on the first predicted classification result, a second loss function is constructed by the difference between the first identifier and the second identifier. The second loss function is minimized according to the result of the minimization process. The first initial model is trained and adjusted to obtain the first model. The first model is then added to the binary classifier.
4. The method according to claim 3, characterized in that, The training process of the comparison model includes: Obtain the second real label corresponding to the training image, and construct the second pre-trained model; The training image is input into the second pre-trained model, and the second pre-trained model outputs a second prediction result. A third loss function is constructed based on the difference between the second true label and the second prediction result. The third loss function is minimized. The second pre-trained model is trained and adjusted based on the result of the minimization process of the third loss function to obtain a trained second pre-trained model. The trained second pre-trained model is used as the comparison model.
5. The method according to claim 1, characterized in that, The training process of the binary classifier includes: Construct a second initial model; The feature-enhanced image is input into the target model and the pre-trained comparison model respectively. The second model weight gradient of the target model with respect to the feature-enhanced image is obtained, and the second model weight gradient is marked with a preset first identifier. At the same time, the third model weight gradient of the pre-trained comparison model with respect to the feature-enhanced image is obtained, and the third model weight gradient is marked with a preset second identifier. The second model weight gradient marked with the first identifier and the third model weight gradient marked with the second identifier are input into the second initial model, and the second initial model outputs the second prediction classification result. Based on the second predicted classification result, a fourth loss function is constructed by the difference between the second and the first or second identifier. The fourth loss function is minimized. The second initial model is trained and adjusted based on the result of the fourth loss function during the minimization process to obtain the second model. The second model is then added to the binary classifier.
6. The method according to claim 1, characterized in that, The method further includes: If the first classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the first theft scenario; or, If the first classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the first theft scenario; or, If the second classification result is a preset first identifier, then the model to be identified is determined to have been stolen from the target model in the second theft scenario; or, If the second classification result is a preset second identifier, then it is determined that the model to be identified was not stolen from the target model in the second theft scenario.
7. The method according to claim 1, characterized in that, The method further includes inputting the first predicted vector into a pre-trained binary classifier corresponding to the first theft scenario, obtaining a first classification result through the first model, and outputting the first classification result through the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model in the binary classifier corresponding to the second theft scenario, obtaining a second classification result through the second model, and outputting the second classification result through the binary classifier. Determine the first probability that the first classification result or the second classification result is a preset first identifier; Determine the second probability that the first classification result or the second classification result is a preset second identifier; If the first probability is greater than the second probability, then the model to be identified is determined to have been stolen from the target model.
8. A model theft detection device, characterized in that, include: The acquisition module is configured to acquire a training image corresponding to a pre-trained target model, and a model to be identified corresponding to the target model; The selected module is configured to select a target training image from the training images according to a preset conversion rate; The conversion module is configured to convert the target training image into a grayscale image based on preset channel pixel values; The feature enhancement module is configured to perform feature enhancement processing on the grayscale image using Laplace transform to obtain the feature-enhanced image; The prediction module is configured to input the feature-enhanced image into the model to be identified, use the model to be identified to output a first prediction vector corresponding to the model to be identified, and obtain the first model weight gradient corresponding to the model to be identified and the feature-enhanced image. The classification module is configured to input the first predicted vector into a first model corresponding to the first theft scenario in a pre-trained binary classifier, obtain a first classification result through the first model, and output the first classification result through the binary classifier. Simultaneously, the weight gradient of the first model is input into a second model corresponding to the second theft scenario in the binary classifier, obtain a second classification result through the second model, and output the second classification result through the binary classifier. The first classification result is used to indicate whether the model to be identified was stolen from the target model in the first theft scenario, and the second classification result is used to indicate whether the model to be identified was stolen from the target model in the second theft scenario.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method according to any one of claims 1 to 7.