A method, apparatus, device and storage medium for defending against model stealing attacks

CN117768192BActive Publication Date: 2026-06-26HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN) (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN) (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)
Filing Date
2023-12-22
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing defense models suffer from poor performance in stealing attack techniques, and static defense mechanisms negatively impact user experience and fail to accurately detect the true intent of users accessing the site.

Method used

By inputting query data into the target model to be defended, the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector are obtained. The predicted confidence vector is then dynamically adjusted using a model that includes confidence threshold, malignant sample threshold and temperature decay factor to determine the nature of the user query.

Benefits of technology

It improves the effectiveness of defending against model theft attacks, enhances the user experience, reduces false positives, and improves the accuracy and security of query results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117768192B_ABST
    Figure CN117768192B_ABST
Patent Text Reader

Abstract

The application discloses a method and device for defending model stealing attack, equipment and storage medium, apply to computer technology field, including: input query data to the corresponding target to be defended model, get the full connection layer output parameter and prediction confidence vector of activation function layer;The connection layer output parameter and the prediction confidence vector are input to the target defense model, adjust the prediction confidence vector, get the target prediction confidence vector;The target defense model is a model including a confidence threshold, a malignant sample threshold and a temperature attenuation factor, the confidence threshold is a threshold for determining whether it is a malignant sample, the malignant sample threshold is a threshold for determining whether it is a malicious user, and the temperature attenuation factor is a factor set based on the defense strength requirement. According to the application, the adjusted prediction confidence vector output by the dynamic model of the confidence threshold, the malignant sample threshold and the temperature attenuation factor can be obtained, and the user query can be defended specifically, thereby improving the defense effect.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, device, and storage medium for defending against model theft attacks. Background Technology

[0002] Model theft attacks are a type of attack that uses black-box access to the output of a target model to replicate its functionality. For example, if an attacker can steal a medical image recognition model that is publicly available, they can avoid paying for it and then offer it to the public for a fee.

[0003] Existing defense technologies mostly employ static defense mechanisms. Since static defense mechanisms generally defend by reducing model output, such as limiting the output to a single label or a decimal point, they cannot accurately detect the true intent of the user and negatively impact the user experience. Summary of the Invention

[0004] In view of this, the purpose of the present invention is to provide a method, apparatus, device and storage medium for defending against model theft attacks, which solves the technical problem of poor performance in defending against model theft attacks in the prior art.

[0005] To address the aforementioned technical problems, this invention provides a method for defending against model theft attacks, comprising:

[0006] The query data is input into the corresponding target model to be defended, and the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector are obtained.

[0007] The output parameters of the connection layer and the predicted confidence vector are input into the target defense model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector. The target defense model is a model that includes a confidence threshold, a malicious sample threshold, and a temperature decay factor. The confidence threshold is a threshold for determining whether a sample is malicious. The malicious sample threshold is a threshold for determining whether a user is malicious based on the current number of malicious queries. The temperature decay factor is a factor set based on the defense strength requirements.

[0008] Optionally, the step of inputting the query data into the corresponding target model to be defended to obtain the fully connected layer output parameters of the activation function layer and the predicted confidence vector includes:

[0009] The query data is input into the target defense model to obtain the parameters of the normalized exponential function and the prediction confidence vector.

[0010] Optionally, the step of inputting the output parameters of the connection layer and the predicted confidence vector into the target defense model, and adjusting the predicted confidence vector to obtain the target predicted confidence vector, includes:

[0011] The output parameters of the connection layer and the predicted confidence vector are input into the target defense deep learning stolen model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector.

[0012] Optionally, the step of inputting the output parameters of the connection layer and the predicted confidence vector into the target defense model, and adjusting the predicted confidence vector to obtain the target predicted confidence vector, includes:

[0013] Determine the largest confidence term in the predicted confidence vector;

[0014] Determine the relationship between the maximum confidence term and the confidence threshold;

[0015] If the maximum confidence term is less than the confidence threshold, the query is determined to be malicious, and the malicious query count is increased using a malicious query counter to obtain the target malicious query count.

[0016] When the number of malicious queries against the target exceeds the malicious sample threshold, the subject inputting the query data is determined to be a malicious user. The predicted confidence vector is adjusted based on the temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted predicted confidence vector. The temperature decay function is a function of the temperature-adjusted predicted confidence vector based on the number of malicious queries against the target, the output parameters of the fully connected layer, and the activation function.

[0017] Optionally, after determining that the subject inputting the query data is a malicious user when the number of malicious queries exceeds the malicious sample threshold, the method further includes:

[0018] Determine whether the number of benign queries exceeds the benign sample threshold;

[0019] When the number of benign queries exceeds the benign sample threshold, it is determined to reduce the number of malicious queries.

[0020] Optionally, adjusting the prediction confidence vector based on the temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted prediction confidence vector includes:

[0021] The temperature decay function is invoked to perform temperature decay or compensation on the temperature decay factor to obtain the target adjusted prediction confidence vector.

[0022] Optionally, after determining the relationship between the maximum confidence term and the confidence threshold, the method further includes:

[0023] When the maximum confidence term is not less than the confidence threshold, the query is determined to be a benign query, and the number of benign queries is increased using a benign query counter.

[0024] This application also provides a device for defending against model theft attacks, comprising:

[0025] The fully connected layer output parameters and predicted confidence vector determination module is used to input query data into the corresponding target model to be defended, and obtain the fully connected layer output parameters and predicted confidence vector of the activation function layer;

[0026] The prediction confidence vector adjustment module is used to input the output parameters of the connection layer and the prediction confidence vector into the target defense model, adjust the prediction confidence vector, and obtain the target prediction confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold, and a temperature decay factor, the confidence threshold is a threshold for determining whether it is a malicious sample, the malicious sample threshold is a threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor is a factor set based on the defense strength requirements.

[0027] This application also provides a device for defending against model theft attacks, including:

[0028] Memory, used to store computer programs;

[0029] A processor for executing the computer program to implement the steps of the method for defending against theft attacks as described above.

[0030] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method for defending against model theft attacks as described above.

[0031] As can be seen, this invention obtains the fully connected layer output parameters and predicted confidence vector of the activation function layer by inputting query data into the corresponding target defense model; the connected layer output parameters and predicted confidence vector are then input into the target defense model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector. The target defense model includes a confidence threshold, a malicious sample threshold, and a temperature decay factor. The confidence threshold is used to determine whether a sample is malicious; the malicious sample threshold is used to determine whether a user is malicious based on the current number of malicious queries; and the temperature decay factor is a factor set based on the required defense strength. Compared with current static adjustment techniques, this application dynamically adjusts the predicted confidence vector based on the model output, since the number of malicious and benign queries changes dynamically. This allows for targeted defense against user queries, improving the defense effectiveness and enhancing the user experience.

[0032] In addition, the present invention also provides a device, apparatus, and storage medium for defending against model theft attacks, which also have the above-mentioned beneficial effects. Attached Figure Description

[0033] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0034] Figure 1 A flowchart illustrating a method for defending against model theft attacks provided in an embodiment of the present invention;

[0035] Figure 2 A flowchart illustrating a method for determining user query intent provided in an embodiment of the present invention;

[0036] Figure 3 A flowchart illustrating a prediction confidence vector adjustment method based on a temperature decay factor, provided for an embodiment of the present invention;

[0037] Figure 4 A flowchart illustrating a method for defending against model theft attacks provided in an embodiment of the present invention;

[0038] Figure 5 A schematic diagram of a device for defending against model theft attacks provided in an embodiment of the present invention;

[0039] Figure 6 This is a schematic diagram of a device for defending against model theft attacks, provided as an embodiment of the present invention. Detailed Implementation

[0040] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0041] Please refer to Figure 1 , Figure 1 A flowchart illustrating a method for defending against model theft attacks provided in an embodiment of the present invention. The method may include:

[0042] S101, input the query data into the corresponding target model to be defended, and obtain the output parameters of the fully connected layer of the activation function layer and the prediction confidence vector.

[0043] This embodiment does not limit the specific target model to be defended. For example, the target model to be defended in this embodiment can be a deep learning model; or the target model to be defended in this embodiment can also be a neural network model. This embodiment does not limit the specific activation function layer. For example, the activation function layer in this embodiment can be a softmax (normalized exponential function) activation function; or the activation function layer function in this embodiment can be a tanh activation function. This embodiment inputs query data into the corresponding target model to be defended, and obtains the fully connected layer output parameters (logits) and predicted confidence vector of the activation function layer.

[0044] It should be further explained that, in order to improve the accuracy of the query results, the above-mentioned inputting the query data into the corresponding target model to be defended, obtaining the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector, can include: inputting the query data into the target model to obtain the parameters of the normalized exponential function and the predicted confidence vector. It is understandable that the normalized exponential function (softmax) performs well and has high accuracy.

[0045] S102, input the output parameters of the connection layer and the predicted confidence vector into the target defense model, adjust the predicted confidence vector to obtain the target predicted confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold and a temperature decay factor, the confidence threshold is the threshold for determining whether it is a malicious sample, the malicious sample threshold is the threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor is a factor set based on the defense strength requirements.

[0046] This embodiment inputs the connection layer output parameters and the predicted confidence vector into the target defense model, adjusts the predicted confidence vector, and obtains the target predicted confidence vector. The target defense model in this embodiment is a model capable of preventing the theft of the target model to be defended. It is understood that when the connection layer output parameters and the predicted confidence vector are input into the target defense model, the target defense model will determine the maximum confidence term in the predicted confidence vector. Based on the relationship between the maximum confidence term and the confidence threshold, it determines whether the current query is malicious, thereby determining whether to modify the number of malicious queries. Then, based on the latest number of malicious queries, it determines whether the subject of the current query is a malicious user. If it is a malicious user, the predicted confidence vector is adjusted based on the number of malicious queries, the temperature decay factor, and the connection layer output parameters. In this embodiment, the confidence vector threshold is generally a threshold of not less than 0.6 and not more than 1; the malicious sample threshold in this embodiment is generally a value greater than 100; the temperature decay factor in this embodiment is a number greater than 0 and less than 1, which represents the defense strength. The smaller the value, the greater the decay, and the stronger the defense. Generally, a value of 0.9 is sufficient.

[0047] It should be further explained that, to improve the applicability of the defense, the above-mentioned inputting the connection layer output parameters and the predicted confidence vector into the target defense model, and adjusting the predicted confidence vector to obtain the target predicted confidence vector, can include: inputting the connection layer output parameters and the predicted confidence vector into the target defense deep learning model that has been stolen, and adjusting the predicted confidence vector to obtain the target predicted confidence vector. The target model to be defended in this embodiment can be applied to deep learning models.

[0048] It should be further explained that, in order to improve the accuracy of the adjustment, the above-mentioned input of the connection layer output parameters and the predicted confidence vector into the target defense model, and the adjustment of the predicted confidence vector to obtain the target predicted confidence vector, may include:

[0049] S1021, Determine the maximum confidence term in the prediction confidence vector;

[0050] S1022, Determine the relationship between the maximum confidence term and the confidence threshold;

[0051] S1023, when the maximum confidence term is less than the confidence threshold, it is determined that the query is malicious, and the malicious query count is increased by using the malicious query counter to obtain the target malicious query count;

[0052] S1024, when the number of malicious queries against the target exceeds the threshold of malicious samples, the subject of the input query data is determined to be a malicious user. The prediction confidence vector is adjusted based on the temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted prediction confidence vector. The temperature decay function is a function of the temperature-adjusted prediction confidence vector based on the number of malicious queries against the target, the output parameters of the fully connected layer, and the activation function.

[0053] In this embodiment, the temperature decay function can adjust the value of the temperature decay factor, thus adjusting the prediction confidence vector. In this embodiment, there is only one confidence vector for a single sample, while the confidence vector contains multiple confidence terms to represent the probabilities of each class.

[0054] It should be further explained that, to reduce false positives, after determining that the subject of the input query data is a malicious user when the number of malicious queries exceeds the malicious sample threshold, the method can further include: determining whether the number of benign queries exceeds the benign sample threshold; if the number of benign queries exceeds the benign sample threshold, then determining to reduce the number of malicious queries. This embodiment, which determines to reduce the number of malicious queries when the number of benign queries exceeds the benign sample threshold, understands that a large number of benign queries indicates a possible previous false positive, thus requiring a reduction in the number of malicious queries to prevent false positives.

[0055] It should be further explained that, in order to improve the accuracy and security of the above-mentioned predicted confidence vector output, adjusting the predicted confidence vector based on the temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted predicted confidence vector may include: calling the temperature decay function to perform temperature decay or compensation on the temperature decay factor to obtain the target adjusted predicted confidence vector. In this embodiment, temperature decay refers to reducing the temperature decay factor; temperature compensation refers to increasing the temperature decay factor. For example, when a user is determined to be a malicious user, based on the number of malicious samples, for each additional malicious sample, the corresponding temperature is multiplied by the temperature decay factor to decay the temperature; when a user is determined to be a benign user, based on the number of benign samples, for each additional malicious sample, the corresponding temperature is divided by the temperature decay factor to compensate for the temperature. It can be understood that when it is determined that the subject of the current query is a benign user, the types of parameters in the predicted confidence vector output can be increased; when it is determined that the subject of the current query is a malicious user, the types of parameters in the predicted confidence vector output can be reduced.

[0056] It should be further explained that, to prevent users from being mistakenly identified as malicious users, after determining the relationship between the maximum confidence term and the confidence threshold, the method can further include: when the maximum confidence term is not less than the confidence threshold, determining the query as a benign query, and using a benign query counter to increase the number of benign queries. This embodiment uses a benign query counter to increase the number of benign queries, thereby improving the accuracy of determining the number of benign queries, improving user experience, and implementing temperature compensation measures to prevent users from being mistakenly identified as malicious users.

[0057] This invention provides a method for defending against model theft attacks, which may include: S101, inputting query data into the corresponding target model to be defended to obtain the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector; S102, inputting the output parameters of the connection layer and the predicted confidence vector into the target defense model, adjusting the predicted confidence vector to obtain the target predicted confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold, and a temperature decay factor, the confidence threshold being the threshold for determining whether it is a malicious sample, the malicious sample threshold being the threshold for determining whether it is a malicious user, and the temperature decay factor being a factor set based on the defense strength requirements. It can be seen that, compared with the current static adjustment technology, this application can dynamically adjust the predicted confidence vector according to the target defense model using the confidence threshold, the malicious sample threshold, and the temperature decay factor, which can specifically defend against user queries, improving the defense effect and enhancing the user experience. Furthermore, the normalized exponential function is used to improve the performance and accuracy of the predicted confidence vector; the target defense model in this embodiment can be applied to deep learning models; when the number of benign queries exceeds the benign sample threshold, this embodiment determines to reduce the number of malicious queries to prevent misjudgment; and this embodiment uses a benign query counter to increase the number of benign queries, thereby improving user experience and preventing users from being misjudged as malicious users by setting temperature compensation measures.

[0058] For a clearer understanding of this invention, please refer to the following details. Figure 2 , Figure 2 A flowchart illustrating a method for determining user query intent provided in an embodiment of the present invention may specifically include:

[0059] This embodiment requires parameter settings on the server side before determining the user's query intent. The main functions of the server side are to provide predictive services for the model and threshold settings for the defense framework. Specifically, the server side needs to perform the following steps:

[0060] Set the confidence threshold for the defense framework Malicious User Judgment Threshold Threshold for determining benign users Temperature decay factor .

[0061] Receive query data sent by the client and input it into the target model to obtain... Layer input and output prediction vector .

[0062] Output prediction vector and Layer input The data is entered into the defense framework and awaits the next query from the user.

[0063] After configuring the above parameters, the steps to determine the user's query intent may include:

[0064] Get the confidence vector threshold set on the server side. Malicious User Judgment Threshold Threshold for determining benign users Temperature decay factor Simultaneously receive the current sample Layer input and output confidence vector .

[0065] For the user intent monitoring phase, based on the maximum confidence term of the predicted vector... (Confidence Vector) (maximum term) and confidence threshold ,like If this query is found to be malicious, the malicious query counter will be updated. (Perform an increment operation), otherwise it is a benign query, and will Set to 0. Based on the malicious query counter. and malicious sample threshold The system determines whether a user is malicious; if so, they are added to a blacklist. If a user is already blacklisted, but subsequently makes more benign queries than the benign user threshold, the system will take appropriate action. Then update the malicious query counter. (Decrease by 1).

[0066] For a clearer understanding of this invention, please refer to the following details. Figure 3 , Figure 3 A flowchart illustrating a prediction confidence vector adjustment method based on a temperature decay factor, provided in this embodiment of the invention, may specifically include:

[0067] Invoke the adaptive temperature adjustment function based on the user's intent. Temperature attenuation or compensation is applied to the prediction vector to obtain the adjusted output result (target prediction confidence vector).

[0068] The adjusted output is sent to the client, and the server waits for the next input.

[0069] The adaptive temperature adjustment function in this embodiment is the temperature decay function mentioned above.

[0070] For ease of understanding, this embodiment provides a specific target defense model (including the temperature decay function) involving the following formulas:

[0071] ;

[0072] This formula is the confidence vector adjustment function, where... This represents the adjusted confidence term, and all the adjusted confidence terms together form the adjusted prediction confidence vector (target prediction confidence vector). This represents the logits items (connection layer output parameters) before adjustment. These items together form the logits vector. This represents the temperature decay function (see the formula below for details). This indicates a counter for malicious samples (see below for details). (expression).

[0073] ;

[0074] This formula is the temperature decay function; when a user is identified as a malicious user, the temperature begins to decay. This represents the initial temperature, which is typically set to 1. This represents the temperature attenuation factor, a number greater than 0 and less than 1. It indicates the defense strength; the smaller the value, the greater the attenuation and the stronger the defense. It can be adjusted according to the specific requirements of the server. This represents the threshold for malicious users (or malicious samples), indicating the number of malicious samples the system can tolerate. It can be adjusted according to the specific requirements of the server.

[0075] ;

[0076] in, This represents the confidence threshold used to determine whether a sample is benign or malignant. This represents the term with the highest confidence level. This represents the threshold for benign users (benign sample threshold), which can be set according to specific server requirements. This represents a benign sample counter. When a user is judged as a malicious user, a temperature decay occurs. When a user is judged as a benign user, a temperature compensation strategy is introduced to avoid false positives, such as... The second case is shown in the expression. As shown in the following formula:

[0077] .

[0078] For a clearer understanding of this invention, please refer to the following details. Figure 4 , Figure 4 A flowchart illustrating a method for defending against model theft attacks provided in this embodiment of the invention may specifically include:

[0079] S401, Obtain the target defense model; wherein, the defense model is a model that adjusts the amount of information output by using a confidence vector threshold, a malicious user determination threshold, a benign user determination threshold, and a temperature decay factor.

[0080] S402, set the confidence vector threshold, malicious user determination threshold, benign user determination threshold and temperature decay factor of the target defense model to obtain the target defense model.

[0081] S403 receives the query data sent by the client and inputs it into the corresponding target model to obtain the input logits and prediction confidence vector of the softmax layer.

[0082] S404 inputs logits and predicted confidence vectors into the target defense model and makes a judgment based on the maximum confidence term and confidence threshold.

[0083] In this embodiment, the maximum confidence term is the confidence term in the predicted confidence vector.

[0084] S405. If the maximum confidence term is not less than the confidence threshold, then the query is determined to be a benign query.

[0085] S406. If the maximum confidence term is less than the confidence threshold, then the query is determined to be malicious, and the number of malicious queries is increased.

[0086] S407 determines whether a user is malicious based on the relationship between the number of malicious queries and the malicious sample threshold.

[0087] S408. If the number of malicious queries exceeds the malicious sample threshold, the user is identified as malicious and added to the blacklist.

[0088] S409 If a user has been blacklisted, but the number of subsequent benign queries exceeds the benign user threshold, then the number of malicious queries will be reduced.

[0089] S410, based on the user's query intent, calls the adaptive temperature adjustment function to perform temperature attenuation or compensation on the predicted confidence vector, and obtains the adjusted output result.

[0090] S411 sends the adjusted output to the client and waits for the next input from the server.

[0091] The main function of the client in this embodiment is to obtain deep learning services provided by the client. Specifically, the client needs to perform the following steps: provide an input box for the user to input or upload query data, such as images or text; send the query data to the server and wait for the server's defense model to respond; receive the output results returned by the server and display them on the screen, such as classification labels or probability distributions; if the user wants to end the session, provide an exit button so that the user can disconnect from the server.

[0092] This strategy, which leverages confidence levels to detect out-of-distribution samples, is simple and effective. The approach of qualitatively characterizing user query intent based on the cumulative number of malicious queries demonstrates high fault tolerance and effectively monitors user intent. Furthermore, this strategy utilizes adaptive temperature decay based on the number of malicious queries, enabling targeted defense against user queries. Moreover, temperature adjustments do not affect the model's prediction accuracy. To provide effective compensation for falsely detected users, this embodiment employs adaptive temperature compensation, thereby compensating for the prediction accuracy of falsely detected benign users.

[0093] The following describes the apparatus for defending against model theft attacks provided in the embodiments of the present invention. The apparatus for defending against model theft attacks described below and the method for defending against model theft attacks described above can be referred to in correspondence with each other.

[0094] Please refer to the details. Figure 5 , Figure 5 A schematic diagram of a device for defending against model theft attacks provided in an embodiment of the present invention may include:

[0095] The fully connected layer output parameters and prediction confidence vector determination module 100 is used to input query data into the corresponding target model to be defended, and obtain the fully connected layer output parameters and prediction confidence vector of the activation function layer;

[0096] The prediction confidence vector adjustment module 200 is used to input the output parameters of the connection layer and the prediction confidence vector into the target defense model, adjust the prediction confidence vector, and obtain the target prediction confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold, and a temperature decay factor, the confidence threshold is a threshold for determining whether it is a malicious sample, the malicious sample threshold is a threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor is a factor set based on the defense strength requirements.

[0097] Furthermore, based on the above embodiments, the module for determining the output parameters and prediction confidence vector of the fully connected layer may include:

[0098] The fully connected layer output parameter and prediction confidence vector determination unit is used to input the query data into the target defense model to obtain the parameters of the normalized exponential function and the prediction confidence vector.

[0099] Furthermore, based on any of the above embodiments, the prediction confidence vector adjustment module 200 may include:

[0100] The prediction confidence vector adjustment unit is used to input the output parameters of the connection layer and the prediction confidence vector into the target defense deep learning stolen model, adjust the prediction confidence vector, and obtain the target prediction confidence vector.

[0101] Furthermore, based on any of the above embodiments, the prediction confidence vector adjustment module 200 may include:

[0102] The maximum confidence term determination unit is used to determine the maximum confidence term in the predicted confidence vector;

[0103] A size relationship determination unit is used to determine the size relationship between the maximum confidence item and the confidence threshold;

[0104] The malicious query unit is used to determine that the query is malicious when the maximum confidence item is less than the confidence threshold, and to increase the number of malicious queries using a malicious query counter to obtain the target number of malicious queries.

[0105] A malicious user identification unit is used to determine that the subject inputting the query data is a malicious user when the number of malicious queries against the target exceeds the malicious sample threshold, and to adjust the prediction confidence vector based on a temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted prediction confidence vector; wherein, the temperature decay function is a function of the temperature-adjusted prediction confidence vector based on the number of malicious queries against the target, the output parameters of the fully connected layer, and the activation function.

[0106] Furthermore, based on the above embodiments, the device for defending against model theft attacks may further include:

[0107] The benign query count determination unit is used to determine whether the benign query count exceeds the benign sample threshold.

[0108] The malicious query reduction unit is used to determine to reduce the number of malicious queries when the number of benign queries is greater than the benign sample threshold.

[0109] Furthermore, based on the above embodiments, the malicious user determination unit may include:

[0110] The temperature attenuation or compensation subunit is used to call the temperature attenuation function to attenuate or compensate the temperature attenuation factor, thereby obtaining the target adjusted prediction confidence vector.

[0111] Furthermore, based on the above embodiments, the device for defending against model theft attacks may further include:

[0112] The benign query count increment unit is used to determine that the query is a benign query when the maximum confidence term is not less than the confidence threshold, and to increase the benign query count using a benign query counter.

[0113] It should be noted that the order of the modules and units in the above-mentioned defense model theft attack device can be changed without affecting the logic.

[0114] The apparatus for defending against model theft attacks provided in this embodiment of the invention may include: a fully connected layer output parameter and prediction confidence vector determination module 100, used to input query data into the corresponding target model to be defended, to obtain the fully connected layer output parameters and prediction confidence vector of the activation function layer; and a prediction confidence vector adjustment module 200, used to input the connection layer output parameters and the prediction confidence vector into the target defense model, adjust the prediction confidence vector, and obtain the target prediction confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold, and a temperature decay factor, the confidence threshold being a threshold for determining whether it is a malicious sample, the malicious sample threshold being a threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor being a factor set based on the defense strength requirements. It can be seen that, compared with the current static adjustment technology, this application can dynamically adjust the prediction confidence vector according to the target defense model using the confidence threshold, the malicious sample threshold, and the temperature decay factor, which can specifically defend against user queries, improve the defense effect, and enhance the user experience. Furthermore, the normalized exponential function is used to improve the performance and accuracy of the predicted confidence vector; the target defense model in this embodiment can be applied to deep learning models; when the number of benign queries exceeds the benign sample threshold, this embodiment determines to reduce the number of malicious queries to prevent misjudgment; and this embodiment uses a benign query counter to increase the number of benign queries, thereby improving user experience and preventing users from being misjudged as malicious users by setting temperature compensation measures.

[0115] The following describes a device for defending against model theft attacks provided by an embodiment of the present invention. The device for defending against model theft attacks described below and the method for defending against model theft attacks described above can be referred to in correspondence with each other.

[0116] Please refer to Figure 6 , Figure 6 A schematic diagram of a device for defending against model theft attacks provided in an embodiment of the present invention may include:

[0117] Memory 10 is used to store computer programs;

[0118] Processor 20 is used to execute computer programs to implement the above-described method for defending against model theft attacks.

[0119] The memory 10, processor 20, and communication interface 30 all communicate with each other through the communication bus 40.

[0120] In this embodiment of the invention, the memory 10 is used to store one or more programs. The programs may include program code, which includes computer operation instructions. In this embodiment of the invention, the memory 10 may store programs for implementing the following functions:

[0121] The query data is input into the corresponding target model to be defended, and the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector are obtained.

[0122] The output parameters of the connection layer and the predicted confidence vector are input into the target defense model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector. The target defense model is a model that includes a confidence threshold, a malicious sample threshold, and a temperature decay factor. The confidence threshold is the threshold for determining whether a sample is malicious. The malicious sample threshold is the threshold for determining whether a user is malicious based on the current number of malicious queries. The temperature decay factor is a factor set based on the defense strength requirements.

[0123] In one possible implementation, the memory 10 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function; and the data storage area may store data created during use.

[0124] Furthermore, memory 10 may include read-only memory and random access memory, providing instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores operating systems and operating instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic tasks and handling hardware-based tasks.

[0125] Processor 20 can be a central processing unit (CPU), an application-specific integrated circuit, a digital signal processor, a field-programmable gate array, or other programmable logic device. Processor 20 can be a microprocessor or any conventional processor. Processor 20 can call programs stored in memory 10.

[0126] The communication interface 30 can be an interface for the communication module, used to connect with other devices or systems.

[0127] Of course, it should be noted that, Figure 6 The structure shown does not constitute a limitation on the device for defending against model theft attacks in the embodiments of the present invention. In practical applications, devices for defending against model theft attacks may include those that are more advanced than those described above. Figure 6 More or fewer components as shown, or combinations of certain components.

[0128] The storage medium provided in the embodiments of the present invention is described below. The storage medium described below can be referred to in correspondence with the method for defending against model theft attacks described above.

[0129] The present invention also provides a storage medium storing a computer program, which, when executed by a processor, implements the steps of the above-described method for defending against model theft attacks.

[0130] The storage medium can include various media that can store program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0131] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0132] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0133] Finally, it should be noted that in this document, relationships such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0134] The present invention provides a detailed description of a method, apparatus, device, and storage medium for defending against model theft attacks. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method for defending against model theft attacks, characterized in that, include: The query data is input into the corresponding target model to be defended, and the output parameters of the fully connected layer of the activation function layer and the predicted confidence vector are obtained. The output parameters of the connection layer and the predicted confidence vector are input into the target defense model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold and a temperature decay factor, the confidence threshold is a threshold for determining whether it is a malicious sample, the malicious sample threshold is a threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor is a factor set based on the defense strength requirements; The output parameters of the connection layer and the predicted confidence vector are input into the target defense model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector, including: Determine the largest confidence term in the predicted confidence vector; Determine the relationship between the maximum confidence term and the confidence threshold; If the maximum confidence term is less than the confidence threshold, the query is determined to be malicious, and the malicious query count is increased using a malicious query counter to obtain the target malicious query count. When the number of malicious queries against the target exceeds the threshold for malicious samples, the subject inputting the query data is determined to be a malicious user. The predicted confidence vector is adjusted based on the temperature decay function and the output parameters of the fully connected layer to obtain the adjusted predicted confidence vector against the target. The temperature decay function is a function of the temperature-adjusted predicted confidence vector based on the number of malicious queries against the target, the output parameters of the fully connected layer, and the activation function.

2. The method for defending against model theft attacks according to claim 1, characterized in that, The step of inputting the query data into the corresponding target model to be defended, and obtaining the fully connected layer output parameters and predicted confidence vector of the activation function layer, includes: The query data is input into the target defense model to obtain the parameters of the normalized exponential function and the prediction confidence vector.

3. The method for defending against model theft attacks according to claim 1, characterized in that, The step of inputting the output parameters of the connection layer and the predicted confidence vector into the target defense model, and adjusting the predicted confidence vector to obtain the target predicted confidence vector includes: The output parameters of the connection layer and the predicted confidence vector are input into the target defense deep learning stolen model, and the predicted confidence vector is adjusted to obtain the target predicted confidence vector.

4. The method for defending against model theft attacks according to claim 1, characterized in that, After determining that the subject inputting the query data is a malicious user when the number of malicious queries exceeds the malicious sample threshold, the method further includes: Determine whether the number of benign queries exceeds the benign sample threshold; When the number of benign queries exceeds the benign sample threshold, it is determined to reduce the number of malicious queries.

5. The method for defending against model theft attacks according to claim 1, characterized in that, The step of adjusting the prediction confidence vector based on the temperature decay function and the output parameters of the fully connected layer to obtain the target adjusted prediction confidence vector includes: The temperature decay function is invoked to perform temperature decay or compensation on the temperature decay factor to obtain the target adjusted prediction confidence vector.

6. The method for defending against model theft attacks according to claim 1, characterized in that, After determining the relationship between the maximum confidence term and the confidence threshold, the method further includes: When the maximum confidence term is not less than the confidence threshold, the query is determined to be a benign query, and the number of benign queries is increased using a benign query counter.

7. A device for defending against model theft attacks, characterized in that, The method for defending against model theft attacks according to any one of claims 1 to 6 includes: The fully connected layer output parameters and predicted confidence vector determination module is used to input query data into the corresponding target model to be defended, and obtain the fully connected layer output parameters and predicted confidence vector of the activation function layer; The prediction confidence vector adjustment module is used to input the output parameters of the connection layer and the prediction confidence vector into the target defense model, adjust the prediction confidence vector, and obtain the target prediction confidence vector; wherein, the target defense model is a model including a confidence threshold, a malicious sample threshold, and a temperature decay factor, the confidence threshold is a threshold for determining whether it is a malicious sample, the malicious sample threshold is a threshold for determining whether it is a malicious user based on the current number of malicious queries, and the temperature decay factor is a factor set based on the defense strength requirements.

8. A device for defending against model theft attacks, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the method for defending against model theft attacks as described in any one of claims 1 to 6.

9. A storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the steps of the method for defending against model theft attacks as described in any one of claims 1 to 6.