Processing method for neural network model, and secure element and computing apparatus
By allocating the deployment of neural network models in general computing units and security units, and utilizing security units to store key parameters and code, the problems of physical attacks and side-channel attacks on neural network models at application nodes are solved, thus achieving security protection for the models.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SHENZHEN GOODIX TECH CO LTD
- Filing Date
- 2025-10-10
- Publication Date
- 2026-06-25
AI Technical Summary
Existing technologies are insufficient to effectively prevent neural network models from being physically attacked and attacked at application nodes, leading to the theft of model parameters and impacting the core interests of AI application developers and service providers.
One part of the neural network model is deployed in a general computing unit, and the other part is deployed in a secure unit. Through encrypted communication and storage technology, the secure unit stores key parameters and code to prevent physical attacks, and retrieves parameters or performs model inference when needed.
It effectively prevents physical and side-channel attacks on neural network models at application nodes, ensures the security of model parameters and computation processes, and protects the core assets of AI applications.
Smart Images

Figure CN2025126740_25062026_PF_FP_ABST
Abstract
Description
Neural network model processing methods, security elements, and computing devices
[0001] This application claims priority to Chinese Patent Application No. 202411881479.4, filed on December 18, 2024, entitled “Processing Method, Security Unit and Computing Device for Neural Network Model”, the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of data security technology, specifically to a method for processing a neural network model, a security unit, and a computing device. Background Technology
[0003] Training an artificial neural network (NN) on a server using massive amounts of data to achieve good predictive performance, and then deploying the trained NN model to application nodes, where the application nodes use the NN model to process application data and generate model outputs, is a classic approach in artificial intelligence (AI) applications. For developers and service providers of AI applications, their core asset is the NN model trained on the server. If this network model is attacked and stolen at the application node, it can easily be used elsewhere, resulting in a loss of core interests for the AI application developers and service providers.
[0004] Neural network model protection schemes in related technologies store the neural network model data in a trusted execution environment (TEE) or execute the model computation process within a TEE. However, this approach is vulnerable to physical attacks. For example, attackers can use error injection attacks to read model parameters from the TEE, or side-channel attacks can exploit side-channel signals released during computation to steal model data, thus leading to the theft of the neural network model. Summary of the Invention
[0005] In view of the above problems, embodiments of this application provide a method for processing neural network models, a secure element (SE), and a computing device to solve the above technical problems.
[0006] In a first aspect, embodiments of this application provide a computing device, comprising: a storage unit for storing first neural network code and at least a portion of first network parameters of a first model inference of a neural network model; a security unit for: storing the remaining first network parameters of the first model inference; and / or storing second neural network code and second network parameters of a second model inference of a neural network model, and performing second model inference based on the second network parameters and the second neural network code; and a general computing unit for performing first model inference based on the first neural network code and the first network parameters.
[0007] In some possible implementations, the general-purpose computing unit is further configured to: receive a neural network model and deployment instructions sent by a server, the deployment instructions indicating the neural network code and network parameters deployed in the general-purpose computing unit, and the neural network code and / or network parameters deployed in the security unit; and deploy the neural network model in the general-purpose computing unit and the security unit according to the deployment instructions.
[0008] In some possible implementations, the security unit is further configured to: receive a parameter read command from the general computing unit, return to the general computing unit the first network parameters stored therein that correspond to the parameter read command; and / or receive an execution command from the general computing unit, perform corresponding second model inference based on the second neural network code and the second network parameters corresponding to the execution command, and return the execution result to the general computing unit, the execution result including model output or intermediate result.
[0009] In some possible implementations, the general computing unit is also configured to: send a parameter read command to the security unit, receive a first network parameter corresponding to the parameter read command returned by the security unit; and / or send an execution command to the security unit, and receive an execution result returned by the security unit after executing a second model inference corresponding to the execution command, the execution result including model output or intermediate results.
[0010] In some possible implementations, the first network parameters stored in the secure unit include at least a portion of the network parameters of at least one neural network layer of the first model inference; and / or the second neural network code includes the neural network code of at least one neural network layer.
[0011] In some possible implementations, the general-purpose computing unit and storage unit are located in a trusted execution environment (TEE) that performs the first model inference.
[0012] In some possible implementations, the general-purpose computing unit is deployed with a virtual machine that performs first model inference.
[0013] Secondly, embodiments of this application provide a secure unit, comprising: a secure storage module for storing a portion of first network parameters for a first model inference of a neural network model; and / or storing second neural network code and second network parameters for a second model inference of a neural network model; and a secure computing module for providing the stored first network parameters to a general computing unit; and / or performing second model inference based on the second network parameters and the second neural network code.
[0014] In some possible implementations, the secure computing module is further configured to: receive a parameter read command from the general computing unit, return to the general computing unit the first network parameters stored therein that correspond to the parameter read command; and / or receive an execution command from the general computing unit, perform corresponding second model inference based on the second neural network code and the second network parameters corresponding to the execution command, and return the execution result to the general computing unit, the execution result including model output or intermediate result.
[0015] In some possible implementations, the first network parameters stored in the secure unit include at least a portion of the network parameters of at least one neural network layer of the first model inference; and / or the second neural network code includes the neural network code of at least one neural network layer.
[0016] Thirdly, embodiments of this application provide a method for processing a neural network model, applied to a security unit. The method includes: storing a portion of first network parameters for a first model inference of the neural network model, and / or second neural network code and second network parameters for a second model inference of the neural network model; providing the stored first network parameters to a general-purpose computing unit; and / or performing second model inference based on the second network parameters and the second neural network code, and returning the execution result to the general-purpose computing unit, the execution result including model output or intermediate results.
[0017] In some possible implementations, providing the stored first network parameters to the general-purpose computing unit includes: receiving a parameter read command from the general-purpose computing unit and returning the first network parameters stored therein that correspond to the parameter read command to the general-purpose computing unit.
[0018] In some possible implementations, performing second model inference based on second network parameters and second neural network code, and returning the execution result to the general computing unit, includes: receiving an execution command from the general computing unit; performing corresponding second model inference based on the second neural network code and second network parameters corresponding to the execution command to obtain the execution result; and returning the execution result to the general computing unit.
[0019] In some possible implementations, the first network parameters stored in the secure unit include at least a portion of the network parameters of at least one neural network layer of the first model inference; and / or the second neural network code includes the neural network code of at least one neural network layer.
[0020] Fourthly, embodiments of this application provide a method for processing a neural network model, applied to a general-purpose computing unit. The method includes: receiving a neural network model and a deployment instruction sent by a server, wherein the deployment instruction is used to indicate the neural network code and network parameters deployed in the general-purpose computing unit, and the neural network code and / or network parameters deployed in a secure unit; and deploying the neural network model in the general-purpose computing unit and the secure unit according to the deployment instruction.
[0021] In some possible implementations, the neural network model is deployed in a general computing unit and a security unit according to deployment instructions, including: storing in a storage unit the first neural network code and at least a portion of the first network parameters of the first model inference of the neural network model; instructing the security unit to store the remaining first network parameters of the first model inference, and / or store in a second neural network code and second network parameters of the second model inference of the neural network model.
[0022] In some possible implementations, the method further includes: sending a parameter read command to the security unit, receiving a first network parameter corresponding to the parameter read command returned by the security unit; and / or sending an execution command to the security unit, and receiving an execution result returned by the security unit after performing a second model inference corresponding to the execution command.
[0023] In some possible implementations, the network parameters deployed in the security element include at least a portion of the network parameters of at least one neural network layer; and / or the neural network code deployed in the security element includes the code of at least one network layer.
[0024] Fifthly, embodiments of this application provide a method for processing a neural network model, applied to a server. The method includes: sending a neural network model and a deployment instruction to an application node, wherein the deployment instruction is used to indicate neural network code and network parameters deployed in a general computing unit, and neural network code and / or network parameters deployed in a secure unit.
[0025] In some possible implementations, the method further includes: adjusting the deployment location of the neural network code and / or network parameters, where the deployment location includes general computing units and security units; and sending a deployment instruction based on the adjustment to the application node.
[0026] Sixthly, embodiments of this application provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the above-described method.
[0027] In a seventh aspect, embodiments of this application provide an application node, including a device body and the aforementioned computing device or security unit disposed on the device body.
[0028] The neural network model processing method, security unit, and computing device provided in this application embodiment deploy some network parameters and / or neural network code of the neural network model in the security unit, which can prevent physical attacks and ensure the critical security of the neural network model.
[0029] These or other aspects of this application will become more apparent in the following description of the embodiments. Attached Figure Description
[0030] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0031] Figure 1 shows a schematic diagram of an AI application system provided in an embodiment of this application.
[0032] Figure 2 shows a structural block diagram of the computing device provided in an embodiment of this application.
[0033] Figure 3A shows a schematic diagram of a model deployment provided in an embodiment of this application.
[0034] Figure 3B shows a schematic diagram of another model deployment provided by an embodiment of this application.
[0035] Figure 3C shows a schematic diagram of another model deployment provided in an embodiment of this application.
[0036] Figure 4 shows an exemplary neural network structure.
[0037] Figure 5 shows a structural block diagram of the security unit provided in an embodiment of this application.
[0038] Figure 6A shows a flowchart of a processing method for a neural network model of a security unit provided in an embodiment of this application.
[0039] Figure 6B shows a flowchart of a processing method for a neural network model of another security unit provided in an embodiment of this application.
[0040] Figure 6C shows a flowchart of a processing method for a neural network model of a security unit provided in an embodiment of this application.
[0041] Figure 7 shows a flowchart of a neural network model processing method provided in an embodiment of this application. Detailed Implementation
[0042] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.
[0043] To enable those skilled in the art to better understand the solutions of this application, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0044] In the embodiments of this application, it should be noted that, in this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations.
[0045] Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0046] In the description of the embodiments of this application, the words "example" or "for example" are used to indicate exemplification, illustration, or description. Any embodiment or design described as "example" or "for example" in the embodiments of this application is not to be construed as being more preferred or having more advantages than another embodiment or design. The use of the words "example" or "for example" is intended to present relative concepts in a clear manner.
[0047] Furthermore, in the embodiments of this application, "multiple" refers to two or more. Therefore, in the embodiments of this application, "multiple" can also be understood as "at least two". "At least one" can be understood as one or more, such as one, two, or more. For example, including at least one means including one, two, or more, and is not limited to which ones are included. For example, including at least one of A, B, and C, then it could include A, B, C, A and B, A and C, B and C, or A and B and C.
[0048] It should be noted that in the embodiments of this application, "and / or" describes the relationship between associated objects, indicating that there can be three relationships. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. In addition, the character " / ", unless otherwise specified, generally indicates that the associated objects before and after it are in an "or" relationship.
[0049] It should be noted that in the embodiments of this application, "connection" can be understood as electrical connection. The connection between two electrical components can be a direct or indirect connection between the two electrical components. For example, the connection between A and B can be a direct connection between A and B, or an indirect connection between A and B through one or more other electrical components.
[0050] Figure 1 illustrates a schematic diagram of an AI application system provided in an embodiment of this application. As shown in Figure 1, the system 100 includes a server 10 and multiple application nodes 20. The server 10 includes a neural network training device 11 and a neural network deployment device 12. The neural network training device 11 trains a neural network 14 using training data 13 to obtain a neural network model 15. The neural network deployment device 12 deploys the trained neural network model 15 to the application nodes 20. The application nodes 20 can execute business logic and call the neural network model 15 deployed on them to process application data to generate model output.
[0051] In this embodiment of the application, according to the classification of network structure, the neural network 14 may include feedforward neural networks (FNN), recurrent neural networks (RNN), convolutional neural networks (CNN), and deep belief networks (DBN), etc.
[0052] Based on the learning method, neural networks 14 can include: supervised learning neural networks, which require labeled data for training, such as FNN, CNN, RNN, etc.; unsupervised learning neural networks, which do not require labeled data and learn the inherent structure and patterns of the data, such as self-organizing maps (SOM) and deep belief networks (DBN); semi-supervised learning neural networks, which combine the characteristics of supervised and unsupervised learning and are trained using a small amount of labeled data and a large amount of unlabeled data; and reinforcement learning neural networks, which learn through interaction with the environment, such as deep Q networks (DQN).
[0053] Based on the activation function, neural networks 14 can include: Sigmoid activation function networks, which are neural networks that use the Sigmoid function as the activation function; and ReLU activation function networks, which are neural networks that use the Rectified Linear Unit (ReLU) function as the activation function, and are widely used because of their simple computation and relatively mild gradient vanishing problem.
[0054] Based on the training algorithm, neural network 14 may include: backpropagation neural network, which is a neural network trained using the backpropagation algorithm; and evolutionary neural network, which uses evolutionary algorithms (such as genetic algorithms) to optimize the network structure and weights.
[0055] Based on network depth, neural networks 14 can include: shallow neural networks, which are neural networks with only one or a few hidden layers; and deep neural networks, which are neural networks with multiple hidden layers and are capable of learning more complex features.
[0056] Based on network function, neural network 14 may include: classification network, a neural network used for classification tasks, such as CNN for image recognition; regression network, a neural network used for regression tasks, predicting continuous values; and generative network, a neural network capable of generating new data samples, such as generative adversarial network (GAN).
[0057] Based on the network connection method, neural networks 14 can include: fully connected networks (FCN), where each neuron is connected to all neurons in the next layer; and sparsely connected networks, where only a portion of the neurons are connected to other neurons, such as the convolutional layers in CNNs.
[0058] Based on the dynamic nature of the network, neural networks 14 can include: static neural networks, whose network structure and weights remain unchanged after training; and dynamic neural networks, whose network structure and weights can be dynamically adjusted according to the input data.
[0059] In the embodiments of this application, neural network 14 and neural network model 15 can be applied to: natural language processing (NLP), such as dialogue systems, automatic translation, speech recognition, text generation, and semantic analysis; recommendation systems, such as personalized recommendation systems that provide accurate recommendations for advertisements, content, and products; image processing, such as image recognition, image generation, image enhancement, and face recognition; video processing, such as video generation, video editing, action recognition, and video content analysis; autonomous driving, such as path planning, object detection, and behavior prediction; medical diagnosis, such as medical image analysis, disease prediction, and medical record management; financial analysis, such as risk assessment, fraud detection, and stock prediction; customer service, such as intelligent customer service systems that enable automatic responses and sentiment analysis; education, such as intelligent tutoring, homework grading, and knowledge graphs; and content creation, such as news writing, scriptwriting, and music generation.
[0060] In this embodiment, server 10 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms, etc.
[0061] In this application embodiment, the application node 20 may include, but is not limited to: personal computers (PCs), such as laptops, desktop computers, etc.; mobile devices, such as smartphones, tablets, etc.; embedded systems, such as smart cameras, drones, autonomous vehicles, etc.; Internet of Things (IoT) devices, such as various sensors and smart devices, such as smart home devices, etc.; edge computing devices, such as edge servers, gateways, etc.; and wearable devices, such as smartwatches and health monitoring devices, etc.
[0062] Referring again to Figure 1, application node 20 is equipped with computing device 200, which processes the neural network model 15 deployed on application node 20 to generate model output based on model input. For developers and service providers of AI applications, their core asset is the neural network model trained on the server. If this network model is attacked and stolen at the application node, it can easily be used elsewhere, resulting in a loss of core interests for AI application developers and service providers.
[0063] There are two major issues regarding data security for neural network models. The first is that while encryption is necessary when model parameter data is distributed, general secure channels can address this. However, once the data reaches the application nodes, it needs to be properly encrypted and stored to prevent attackers from directly reading the model parameters. Current solutions typically store data in a TEE (Transparent Equipment Environment), but this type of storage module cannot reliably resist physical attacks, such as error injection attacks. Error injection attacks utilize lasers, voltage glitches, voltage spikes, electromagnetic radiation, etc., to inject errors during system operation, leading to the theft of system assets. For example, if data stored in a TEE is accessed directly by an attacker, the TEE system has logic to ensure it doesn't return model data, such as a judgment. However, if an error injection attack occurs, this judgment will be incorrect, resulting in the continuous reading of model data.
[0064] The second major problem is that during the model calculation process, the calculation of model parameters and application data takes place on a general-purpose CPU or in a TEE with some security features. This calculation process can also be compromised by physical attacks, such as side-channel attacks. Side-channel attacks exploit the side-channel signals that are released during the calculation process, such as power consumption and electromagnetic radiation. Attackers can use this information to obtain core data.
[0065] To this end, embodiments of this application provide a computing device, which may be an application node or be disposed on an application node. As shown in FIG2, the computing device 200 provided in this application embodiment includes a general computing unit 210, a security unit 220, and a storage unit 230. In specific implementations, the communication between the general computing unit 210 and the security unit 230 may be encrypted communication, and the security unit 220 and the storage unit 230 may employ encrypted storage technology.
[0066] In this embodiment, the general-purpose computing unit 210, security unit 220, and storage unit 230 shown in FIG2 can be discrete devices or at least partially integrated together. In a specific implementation, the general-purpose computing unit 210 can be integrated with the storage unit 230 into a system-on-a-chip (SoC), while the security unit 220 is a chip independent of this SoC. In other implementations, the general-purpose computing unit 210, storage unit 230, and security unit 220 are integrated into a SoC; in this case, the security unit 220 is also referred to as an embedded security unit or an integrated security unit. In still other implementations, the general-purpose computing unit 210 and security unit 220 are integrated into a SoC, while the storage unit 230 is a separate memory. The storage unit 230 can be non-volatile memory.
[0067] In this embodiment, a portion of the neural network model is deployed in the general-purpose computing unit 210, while another portion is deployed in the secure unit 220. This protects the portion deployed in the secure unit 220 from physical attacks, while avoiding the limitations on storage space and computing speed that would result from deploying the entire neural network model in the secure unit 220.
[0068] In some possible implementations, referring to FIG3A, storage unit 230 stores the first neural network code and a portion of the first network parameters for the first model inference of the neural network model; security unit 220 stores the remaining first network parameters for the first model inference. In this implementation, the first model inference is the complete inference of the neural network model, the first neural network code is the complete code of the neural network model, and the first network parameters are the complete network parameters of the neural network model. Performing first model inference based on the model input can obtain the model output. The partial network parameters of the neural network model are stored in security unit 220, which can prevent these partial network parameters from being subjected to physical attacks.
[0069] Referring again to Figure 3A, when the neural network model is invoked to process the model input and obtain the model output, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters that are not stored in the storage unit 230. The security unit receives the parameter read command and returns the first network parameters it stores, corresponding to the parameter read command, to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 corresponding to the parameter read command and performs the first model inference based on the received first network parameters.
[0070] In specific implementations, the first network parameters stored in the security unit 220 may include at least a portion of the network parameters of at least one network layer. This application embodiment does not limit the specific network parameters stored in the security unit 220. In practical applications, the network parameters stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network parameters. For example, if a network layer has many network parameters, some of the network parameters of that network layer can be stored in the security unit 220; if a network layer has few or critical network parameters, all the network parameters of that network layer can be stored in the security unit 220.
[0071] In one example, referring to the exemplary network layer structure shown in Figure 4, storage unit 230 stores the neural network code for regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, fully connected layer 7, and fully connected layer 8, as well as the network parameters of the other network layers except for convolutional layer 4. Security unit 220 stores the network parameters of convolutional layer 4. Continuing to refer to Figure 4, general computing unit 210 executes business logic. During this process, if it is determined that a neural network model needs to be called, the neural network model is called based on the model input. Based on the network parameters stored in storage unit 230, regularization layer 1, convolutional layer 2, and pooling layer 3 are executed. When executing convolutional layer 4, a parameter read command is sent to security unit 220 to obtain the network parameters of convolutional layer 4. Security unit receives the parameter read command and returns the network parameters of convolutional layer 4 to general computing unit 210. The general computing unit 210 receives the network parameters of the convolutional layer 4 returned by the security unit 220, executes the convolutional layer 4, and executes the pooling layer 5, flattening layer 6, fully connected layer 7 and fully connected layer 8 based on the network parameters stored in the storage unit 230 to obtain the model output.
[0072] In another example, referring to the exemplary network layer structure shown in Figure 4, storage unit 230 stores the neural network code for regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, fully connected layer 7, and fully connected layer 8, as well as the network parameters of the other network layers except for convolutional layer 4 and fully connected layer 7. Security unit 220 stores the network parameters of convolutional layer 4 and fully connected layer 7. Continuing to refer to Figure 4, general computing unit 210 executes business logic. During this process, if it is determined that a neural network model needs to be called, the neural network model is called based on the model input. Based on the network parameters and neural network code stored in storage unit 230, regularization layer 1, convolutional layer 2, and pooling layer 3 are executed. When executing convolutional layer 4, a first parameter read command is sent to security unit 220, indicating that the network parameters of convolutional layer 4 should be obtained. Security unit receives the first parameter read command and returns the network parameters of convolutional layer 4 to general computing unit 210. The general computing unit 210 receives the network parameters of convolutional layer 4 returned by the security unit 220, executes convolutional layer 4, and then executes pooling layer 5 and flattening layer 6 based on the network parameters and neural network code stored in storage unit 230. Subsequently, when executing fully connected layer 7, it sends a second parameter read command to the security unit 220, which instructs to retrieve the network parameters of fully connected layer 7. The security unit 220 receives the second parameter read command and returns the network parameters of fully connected layer 7 to the general computing unit 210. The general computing unit 210 receives the network parameters of fully connected layer 7 returned by the security unit 220, executes fully connected layer 7, and then executes fully connected layer 8 to obtain the model output.
[0073] In other examples, some network parameters of a network layer are stored in storage unit 230, while the remaining network parameters are stored in security unit 220. In this case, when executing the network layer, general computing unit 210 retrieves the complete network parameters of the network layer from storage unit 230 and security unit 220. The process by which general computing unit 210 retrieves network parameters from security unit 220 is similar to the previous examples and will not be described in detail here.
[0074] In some possible implementations, referring to FIG3B, storage unit 230 stores the first neural network code and first network parameters for the first model inference of the neural network model; security unit 220 stores the second neural network code and second network parameters for the second model inference of the neural network model, and performs second model inference based on the second network parameters and second neural network code. Storage unit 230 stores the complete neural network code and network parameters for the first model inference, and security unit 220 stores the complete neural network code and network parameters for the second model inference. In this implementation, first model inference and second model inference are performed based on model input to generate model output. Through this implementation, part of the model inference of the neural network model is executed in security unit 220, which can prevent this part of the computation process from being physically attacked.
[0075] Referring again to Figure 3B, when calling the neural network model to process the model input and obtain the model output, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends an execution command to the security unit 220 to perform second model inference in the security unit 220. The security unit 220 receives the execution command, performs the corresponding second model inference, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. This execution result may include intermediate results or model output. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform first model inference based on the intermediate result to obtain the model output. If the execution result is a model output, the general-purpose computing unit 210 executes business logic based on the model output.
[0076] In specific implementations, the second neural network code stored and executed by the security unit 220 may include neural network code for at least one network layer. This application embodiment does not limit the specific neural network code stored and executed by the security unit 220. In practical applications, the neural network code stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network layers.
[0077] In one example, referring to the exemplary network layer structure shown in Figure 4, storage unit 230 stores the neural network code and network parameters of regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 7. Security unit 220 stores the neural network code and network parameters of fully connected layer 8. Continuing to refer to Figure 4, general computing unit 210 executes business logic. During this process, if it determines to call a neural network model, it calls the neural network model based on the model input. Based on the network parameters and neural network code stored in storage unit 230, it executes regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 7. Then, it sends an execution command to security unit 220 to execute fully connected layer 8. This execution command may carry the intermediate results obtained from executing fully connected layer 7. Security unit 220 receives this execution command, performs calculations on fully connected layer 8 based on its stored neural network code and network parameters, and returns the execution result to general computing unit 210. The general computing unit 210 receives the execution result returned by the security unit 220. In Figure 4, the fully connected layer 8 is the last layer of the neural network model. The execution result obtained by the security unit 220 from executing the fully connected layer 8 is the model output, and the general computing unit 210 can execute business logic based on the model output. In this example, the first model inference includes executing the regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 7, and the second model inference includes executing the fully connected layer 8.
[0078] In another example, referring to the exemplary network layer structure shown in Figure 4, storage unit 230 stores the neural network code and network parameters of regularization layer 1, convolutional layer 2, pooling layer 3, pooling layer 5, flattening layer 6, and fully connected layer 8. Security unit 220 stores the neural network code and network parameters of convolutional layer 4 and fully connected layer 7. Continuing to refer to Figure 4, general computing unit 210 executes business logic. During this process, if it determines to call a neural network model, it calls the neural network model based on the model input, and executes regularization layer 1, convolutional layer 2, and pooling layer 3 based on the network parameters and neural network code stored in storage unit 230. Subsequently, general computing unit 210 sends a first execution command to security unit 220. The first execution command carries the execution result of pooling layer 3 and instructs security unit 220 to execute convolutional layer 4. Security unit 220 receives the first execution command sent by general computing unit, performs calculation of convolutional layer 4 based on its stored neural network code and network parameters, and returns the execution result to general computing unit 210. The general computing unit 210 receives the execution result of convolutional layer 4 returned by the security unit 220. Subsequently, the general computing unit 210 executes pooling layer 5 and flattening layer 6 based on the network parameters and neural network code stored in storage unit 230. Further, the general computing unit 210 sends a second execution command to the security unit 220, carrying the execution result of flattening layer 6 and instructing the security unit to execute fully connected layer 7. The security unit 220 receives the second execution command from the general computing unit, performs the calculation of fully connected layer 7 based on its stored neural network code and network parameters, and returns the execution result to the general computing unit 210. The general computing unit 210 receives the execution result of fully connected layer 7 returned by the security unit 220. Subsequently, the general computing unit 210 executes fully connected layer 8 based on the network parameters and neural network code stored in storage unit 230 to obtain the model output. In this example, the first model inference includes performing regularization layer 1, convolutional layer 2, pooling layer 3, pooling layer 5, flattening layer 6, and fully connected layer 8, while the second model inference includes performing convolutional layer 4 and fully connected layer 7.
[0079] In some possible implementations, referring to FIG3C, storage unit 230 stores the first neural network code and a portion of the first network parameters for the first model inference of the neural network model; security unit 220 stores the remaining first network parameters for the first model inference, as well as the second neural network code and second network parameters for the second model inference of the neural network model. Security unit 220 performs second model inference based on the second network parameters and the second neural network code. General computing unit 210 performs first model inference based on the first neural network code and the first network parameters. This implementation can prevent physical attacks on some network parameters and some computation processes.
[0080] Referring again to Figure 3C, when calling the neural network model to process the model input to obtain the model output, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters not stored in the storage unit 230. The security unit receives the parameter read command and returns the first network parameters stored therein corresponding to the parameter read command to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 corresponding to the parameter read command and performs first model inference based on the received first network parameters. During the execution of the first model inference, the general-purpose computing unit 210 also sends an execution command to the security unit 220 to perform second model inference in the security unit 220. The security unit 220 receives the execution command, performs the corresponding second model inference, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. This execution result may include intermediate results or model output. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform first model inference based on the intermediate result to obtain the model output. If the execution result is the model output, the general computing unit 210 executes the business logic based on the model output.
[0081] In one example, referring to the exemplary network layer structure shown in Figure 4, storage unit 230 stores the neural network code for regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 8, as well as the network parameters of the other network layers except for convolutional layer 4. Security unit 220 stores the network parameters of convolutional layer 4, and the neural network code and parameters of fully connected layer 7. Continuing to refer to Figure 4, general computing unit 210 executes business logic. During this process, if it determines to call a neural network model, it calls the neural network model based on the model input, and executes regularization layer 1, convolutional layer 2, and pooling layer 3 based on the network parameters and neural network code stored in storage unit 230. Subsequently, general computing unit 210 sends a parameter read command to security unit 220 to obtain the network parameters of convolutional layer 4. Security unit 220 receives the parameter read command sent by general computing unit 210 and returns the network parameters of convolutional layer 4 stored by itself to general computing unit 210. The general computing unit 210 receives the network parameters of convolutional layer 4 returned by the security unit 220 and executes convolutional layer 4. Subsequently, based on the network parameters and neural network code stored in the storage unit 230, the general computing unit 210 executes pooling layer 5 and flattening layer 6. Further, the general computing unit 210 sends an execution command to the security unit 220, carrying the execution result of flattening layer 6 and instructing the security unit 220 to execute fully connected layer 7. The security unit 220 receives the execution command sent by the general computing unit, performs the calculation of fully connected layer 7 based on its stored neural network code and network parameters, and returns the execution result to the general computing unit 210. The general computing unit 210 receives the execution result of fully connected layer 7 returned by the security unit 220. Subsequently, the general computing unit 210 executes fully connected layer 8 based on the network parameters and neural network code stored in the storage unit 230 to obtain the model output.
[0082] In some possible implementations, the general-purpose computing unit 210 is further configured to: receive a neural network model and deployment instructions sent by a server, the deployment instructions indicating the neural network code and network parameters deployed in the general-purpose computing unit, and the neural network code and / or network parameters deployed in the security unit; and deploy the neural network model in the general-purpose computing unit and the security unit according to the deployment instructions. In a specific implementation, the network parameters deployed in the security unit 220 include at least a portion of the network parameters of at least one neural network layer executed by the general-purpose computing unit 210. The neural network code deployed in the security unit 220 includes the neural network code of at least one neural network layer.
[0083] In some possible implementations, the general-purpose computing unit 210 and the storage unit 230 reside in a Trusted Execution Environment (TEE), which executes the first model inference. In this implementation, the TEE can communicate with the security unit 220, obtaining network parameters from the security unit 220, instructing the security unit 220 to execute the second model inference, and receiving the execution result of the second model inference. To ensure the security of terminal devices, terminal device security frameworks represented by ARM (advanced RISC machines) TrustZone (RISC stands for reduced instruction set computer) have emerged. Under the ARM TrustZone framework, system-level security is achieved by dividing the hardware and software resources of the system-on-chip (SoC) into two worlds: the normal world and the secure world (also called the security domain and the non-secure domain). These two worlds correspond to the rich execution environment (REE) and the trusted execution environment (TEE), respectively. The REE and TEE run on the same physical device and each runs an operating system. The REE runs client applications (CAs) with low security requirements; the TEE, on the other hand, runs trusted applications (TAs) that require security guarantees, providing a secure execution environment for authorized trusted applications (TAs). CAs and TAs communicate through a communication mechanism provided by ARMTrustZone, just like a client and server.
[0084] In some possible implementations, the general-purpose computing unit 210 is deployed with a virtual machine that performs the first model inference. In this implementation, the virtual machine can communicate with the security unit 220, and the virtual machine can obtain network parameters from the security unit 220, instruct the security unit 220 to perform the second model inference, and receive the execution result of the second model inference.
[0085] This application embodiment also provides a security unit that can implement the functions of the security unit 220 shown in FIG2. In some embodiments, as shown in FIG5, the security unit 500 may include a secure storage module 510 and a secure computing module 520. The secure storage module 510 may have access permissions that can only be accessed by the secure computing module 520.
[0086] In some embodiments, the secure storage module 510 can store a portion of the first network parameters from the first model inference of the neural network model, and the secure computing module 520 can provide its stored first network parameters to the general-purpose computing unit. In this embodiment, the first model inference of the neural network model is its complete model inference, and the first network parameters are the complete parameters of the neural network model. In a specific implementation, the secure computing module 520 can receive a parameter read command from the general-purpose computing unit and return the first network parameters stored in the secure storage module 510 and corresponding to the parameter read command to the general-purpose computing unit. In some possible embodiments, in the secure unit 500, the first network parameters stored in the secure storage module 510 include at least a portion of the network parameters of at least one neural network layer of the first model inference.
[0087] In one example, referring to the exemplary network layer structure shown in Figure 4, the secure storage module 510 of the secure unit 500 stores the network parameters of convolutional layer 4. Continuing to refer to Figure 4, when the general computing unit executes convolutional layer 4, it sends a parameter read command to the secure unit 500 to obtain the network parameters of convolutional layer 4. The secure computing module 520 of the secure unit 500 receives the parameter read command, reads the network parameters of convolutional layer 4 stored in the secure storage module 510, and returns the network parameters of convolutional layer 4 to the general computing unit. The general computing unit receives the network parameters of convolutional layer 4 returned by the secure computing module 520 and executes convolutional layer 4. This example mainly illustrates the process of the secure unit 500 providing network parameters. For examples of the general computing unit executing other network layers, please refer to the examples mentioned above in this specification, which will not be repeated here.
[0088] In another example, referring to the exemplary network layer structure shown in Figure 4, the secure storage module 510 stores the network parameters of convolutional layer 4 and fully connected layer 7. Continuing to refer to Figure 4, when the general computing unit executes convolutional layer 4, it sends a first parameter read command to the secure unit 500, instructing it to retrieve the network parameters of convolutional layer 4. The secure computing module 520 of the secure unit 500 receives the first parameter read command, reads the network parameters of convolutional layer 4 stored in the secure storage module 510, and returns the network parameters of convolutional layer 4 to the general computing unit. The general computing unit receives the network parameters of convolutional layer 4 returned by the secure computing module 520 and executes convolutional layer 4. When the general computing unit executes fully connected layer 7, it sends a second parameter read command to the secure unit 500, instructing it to retrieve the network parameters of fully connected layer 7. The secure computing module 520 of the secure unit 500 receives the second parameter read command, reads the network parameters of fully connected layer 7 stored in the secure storage module 510, and returns the network parameters of fully connected layer 7 to the general computing unit. The general-purpose computing unit receives the network parameters of the fully connected layer 7 returned by the security unit 500 and executes the fully connected layer 7. For examples of the general-purpose computing unit executing other network layers, please refer to the preceding examples in this specification; they will not be repeated here. This example mainly illustrates the process by which the security unit 500 provides network parameters; for examples of the general-purpose computing unit executing other network layers, please refer to the preceding examples in this specification.
[0089] In some embodiments, the secure storage module 510 can store the second neural network code and the second network parameters for the second model inference of the neural network model, and the secure computing module 520 can perform the second model inference based on the second network parameters and the second neural network code. In this embodiment, the second model inference is a partial model inference of the neural network model. In a specific implementation, the secure computing module 520 can receive an execution command from the general computing unit, perform the corresponding second model inference based on the second neural network code and the second network parameters corresponding to the execution command stored in the secure storage module 510, and return the execution result to the general computing unit. The execution result includes model output or intermediate results. In some possible embodiments, in the secure unit 500, the second neural network code stored in the secure storage module 510 includes the neural network code of at least one neural network layer.
[0090] In one example, referring to the exemplary network layer structure shown in Figure 4, the security unit 500 stores the neural network code and network parameters of the fully connected layer 8. Continuing to refer to Figure 4, the general computing unit executes regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 7, and then sends an execution command to the security unit 500 to execute the fully connected layer 8. This execution command may carry the intermediate results obtained from executing the fully connected layer 7. The security computing module 520 of the security unit 500 receives the execution command, performs calculations on the fully connected layer 8 based on the neural network code and network parameters of the fully connected layer 8 stored in the security storage module 510, and returns the execution result to the general computing unit. The general computing unit receives the execution result returned by the security computing module 520. In Figure 4, the fully connected layer 8 is the last layer of the neural network model, and the execution result obtained by the security computing module 520 in executing the fully connected layer 8 is the model output. The general computing unit can execute business logic based on the model output. In this example, the first model inference includes performing regularization layer 1, convolutional layer 2, pooling layer 3, convolutional layer 4, pooling layer 5, flattening layer 6, and fully connected layer 7, while the second model inference includes performing fully connected layer 8.
[0091] In another example, referring to the exemplary network layer structure shown in Figure 4, the secure storage module 510 of the secure unit 500 stores the neural network code and network parameters of convolutional layer 4 and fully connected layer 7. Continuing to refer to Figure 4, the general computing unit executes business logic. During this process, if it is determined that a neural network model needs to be called, the neural network model is called based on the model input, executing regularization layer 1, convolutional layer 2, and pooling layer 3. Subsequently, the general computing unit sends a first execution command to the secure unit 500. The first execution command carries the execution result of pooling layer 3 and instructs the secure unit to execute convolutional layer 4. The secure computing module 520 of the secure unit 500 receives the first execution command sent by the general computing unit, performs the calculation of convolutional layer 4 based on the neural network code and network parameters of convolutional layer 4 stored in the secure storage module 510, and returns the execution result to the general computing unit. The general computing unit receives the execution result of convolutional layer 4 returned by the secure computing module 520. Subsequently, the general computing unit 210 executes pooling layer 5 and flattening layer 6, and sends a second execution command to the security unit 220. The second execution command carries the execution result of flattening layer 6 and instructs the security unit to execute fully connected layer 7. The security computing module 520 of the security unit 500 receives the second execution command sent by the general computing unit, performs the calculation of fully connected layer 7 based on the neural network code and network parameters of fully connected layer 7 stored in the security storage module 510, and returns the execution result to the general computing unit. The general computing unit receives the execution result of fully connected layer 7 returned by the security computing module 520. In this example, the first model inference includes executing regularization layer 1, convolutional layer 2, pooling layer 3, pooling layer 5, flattening layer 6, and fully connected layer 8, and the second model inference includes executing convolutional layer 4 and fully connected layer 7.
[0092] In some embodiments, the secure storage module 510 can store a portion of the first network parameters for the first model inference of the neural network model, and store the second neural network code and second network parameters for the second model inference of the neural network model. In this embodiment, the first model inference is a portion of the neural network model's model inference, and the second model inference is another portion of the neural network model's model inference. The secure computing module 520 can provide the first network parameters stored in the secure storage module 510 to the general-purpose computing unit, and perform second model inference based on the second network parameters and second neural network code stored in the secure storage module 510, and return the execution result to the general-purpose computing unit, the execution result including model output or intermediate results.
[0093] In a specific implementation, the secure computing module 520 can receive parameter read commands from the general computing unit, return the first network parameters stored in the secure storage module 510 and corresponding to the parameter read commands to the general computing unit. The secure computing module 520 can also receive execution commands from the general computing unit, perform corresponding second model inference based on the second neural network code and second network parameters corresponding to the execution commands, and return the execution results to the general computing unit. The execution results include model outputs or intermediate results.
[0094] In one example, referring to the exemplary network layer structure shown in Figure 4, the secure storage module 510 of the secure unit 500 stores the network parameters of convolutional layer 4, as well as the neural network code and network parameters of fully connected layer 7. Continuing to refer to Figure 4, the general computing unit executes business logic. During this process, if it determines to call a neural network model, it calls the neural network model based on the model input, executing regularization layer 1, convolutional layer 2, and pooling layer 3. Subsequently, the general computing unit sends a parameter read command to the secure unit 500 to obtain the network parameters of convolutional layer 4. The secure computing module 520 of the secure unit 500 receives the parameter read command sent by the general computing unit 210 and returns the network parameters of convolutional layer 4 stored in the secure storage module 510 to the general computing unit. The general computing unit receives the network parameters of convolutional layer 4 returned by the secure computing module 520 and executes convolutional layer 4. Subsequently, the general computing unit executes pooling layer 5 and flattening layer 6, sends an execution command to the secure unit 220, the execution command carrying the execution result of flattening layer 6, and instructs the secure unit 500 to execute fully connected layer 7. The secure computing module 520 receives the execution command sent by the general computing unit, performs the calculation of the fully connected layer 7 based on the neural network code and network parameters of the fully connected layer 7 stored in the secure storage module 510, and returns the execution result to the general computing unit. The general computing unit receives the execution result of the fully connected layer 7 returned by the secure unit 500. Subsequently, the general computing unit executes the fully connected layer 8 to obtain the model output.
[0095] This application provides a method for processing a neural network model, which can be implemented by security unit 220 and security unit 500. The implementation of this method will be described below with reference to Figures 6A, 6B, and 6C.
[0096] In some embodiments, referring to FIG6A, the method includes the following steps.
[0097] Step S601A: The secure unit stores the first network parameters of the first model inference of the neural network model.
[0098] In this embodiment, in the computing device 200, the storage unit 230 stores the first neural network code and a portion of the first network parameters for the first model inference of the neural network model. The security unit 220 can store a portion of the first network parameters for the first model inference of the neural network model. In this embodiment, the first model inference is the complete inference of the neural network model, the first neural network code is the complete code of the neural network model, and the first network parameters are the complete network parameters of the neural network model. A portion of the first network parameters is stored in the storage unit 230, and another portion is stored in the security unit 220. Performing first model inference based on the model input can obtain the model output. Storing a portion of the network parameters of the neural network model in the security unit 220 can prevent this portion of the network parameters from being subjected to physical attacks.
[0099] In a specific implementation, the security unit 220 can receive the first network parameters of the first model inference of the neural network model sent by the general computing unit 210, and store the first network parameters of the first model inference of the neural network model in the security unit 220.
[0100] In specific implementations, the first network parameters stored in the security unit 220 within the computing device 200 may include at least a portion of the network parameters from at least one network layer. This application embodiment does not limit the specific network parameters stored in the security unit 220. In practical applications, the network parameters stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network parameters. For example, if a network layer has many network parameters, some of those parameters can be stored in the security unit 220; if a network layer has few or critical network parameters, all of those parameters can be stored in the security unit 220.
[0101] In step S602A, the security unit provides the stored first network parameters to the general computing unit.
[0102] In this embodiment, in the computing device 200, when the general-purpose computing unit 210 calls the neural network model to process the model input to obtain the model output, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters that are not stored in the storage unit 230. The security unit 220 receives the parameter read command and returns the first network parameters it stores that correspond to the parameter read command to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 that correspond to the parameter read command and performs the first model inference based on the received first network parameters.
[0103] In some embodiments, referring to FIG6B, the method includes the following steps. Through this implementation, partial model inference of the neural network model is performed in the secure unit 220, which can prevent this part of the computation process from being physically attacked.
[0104] Step S601B: The secure unit stores the second neural network code and the second network parameters for the second model inference of the neural network model.
[0105] In this embodiment, in the computing device 200, storage unit 230 stores the first neural network code and the first network parameters for the first model inference of the neural network model, and security unit 220 stores the second neural network code and the second network parameters for the second model inference of the neural network model. Storage unit 230 stores the complete neural network code and network parameters for the first model inference, and security unit 220 stores the complete neural network code and network parameters for the second model inference.
[0106] In a specific implementation, the security unit 220 can receive the second neural network code and the second network parameters of the second model inference of the neural network model sent by the general computing unit 210, and store the second neural network code and the second network parameters of the second model inference of the neural network model in the security unit 220. The second neural network code is configured to be executable by the security unit 220, for example, an application running in the security unit 220.
[0107] In step S602B, the security unit executes second model inference based on the second network parameters and the second neural network code to obtain the execution result, which includes model output or intermediate result.
[0108] In step S603B, the security unit returns the execution result to the general computing unit.
[0109] In this implementation, first model inference and second model inference are performed based on the model input to generate the model output. Through this implementation, part of the model inference of the neural network model is performed in the security unit 220, which can prevent this part of the computation process from being physically attacked.
[0110] In this embodiment, when the general-purpose computing unit 210 calls the neural network model to process the model input and obtain the model output, it performs first model inference. During the execution of the first model inference, it sends an execution command to the security unit 220 to perform second model inference in the security unit 220. The security unit 220 receives the execution command, performs the corresponding second model inference based on its stored second neural network code and second network parameters, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. The execution result may include intermediate results or model output. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform first model inference based on the intermediate result to obtain the model output. If the execution result is a model output, the general-purpose computing unit 210 executes business logic based on the model output.
[0111] In specific implementations, the second neural network code stored in the security unit 220 may include neural network code for at least one network layer. This application embodiment does not limit the specific neural network code stored in the security unit 220. In practical applications, the neural network code stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network layers.
[0112] In some embodiments, referring to FIG6C, the method includes the following steps.
[0113] In step S601C, the secure unit stores the first network parameters of the first model inference of the neural network model, the second neural network code of the second model inference of the neural network model, and the second network parameters.
[0114] In this embodiment, in the computing device 200, the storage unit 230 stores the first neural network code and some of the first network parameters for the first model inference of the neural network model; the security unit 220 stores the remaining first network parameters for the first model inference, as well as the second neural network code and second network parameters for the second model inference of the neural network model. This implementation method can prevent some network parameters and some computational processes from being physically attacked.
[0115] In a specific implementation, the security unit 220 can receive the remaining first network parameters of the first model inference of the neural network model, and the second neural network code and second network parameters of the second model inference of the neural network model sent by the general computing unit 210, and store the remaining first network parameters of the first model inference, and the second neural network code and second network parameters of the second model inference of the neural network model in the security unit 220. The second neural network code is configured to be executable by the security unit 220, for example, an application running in the security unit 220.
[0116] In specific implementations, the second neural network code stored in the security unit 220 may include neural network code for at least one network layer. This application embodiment does not limit the specific neural network code stored and executed by the security unit 220. In practical applications, the neural network code stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network layers.
[0117] In specific implementations, the first network parameters stored in the security unit 220 may include at least a portion of the network parameters of at least one network layer. This application embodiment does not limit the specific network parameters stored in the security unit 220. In practical applications, the network parameters stored in the security unit 220 can be selected based on factors such as the network structure of the neural network model and the importance of the network parameters. For example, if a network layer has many network parameters, some of the network parameters of that network layer can be stored in the security unit 220; if a network layer has few or critical network parameters, all the network parameters of that network layer can be stored in the security unit 220.
[0118] In step S602C, the security unit provides the stored first network parameters to the general computing unit.
[0119] In this embodiment, when the general-purpose computing unit 210 calls the neural network model to process the model input to obtain the model output, it performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters that are not stored in the storage unit 230. The security unit receives the parameter read command and returns the first network parameters it stores that correspond to the parameter read command to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 that correspond to the parameter read command and performs the first model inference based on the received first network parameters.
[0120] In step S603C, the security unit executes second model inference based on the second network parameters and the second neural network code to obtain the execution result, which includes model output or intermediate result.
[0121] In step S604C, the security unit returns the execution result to the general computing unit.
[0122] In this embodiment, during the execution of the first model inference, the general-purpose computing unit 210 also sends an execution command to the security unit 220 to execute the second model inference in the security unit 220. The security unit 220 receives the execution command, executes the corresponding second model inference, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. This execution result may include intermediate results or model output. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform the first model inference based on the intermediate result to obtain the model output. If the execution result is a model output, the general-purpose computing unit 210 executes business logic based on the model output.
[0123] It should be understood that the execution order of steps S602C and S603C is not limited in this embodiment. Based on the network parameters and the deployment of the neural network code, step S602C can be executed after step S603C, or step S603C can be executed after step S602C.
[0124] This application provides a method for processing a neural network model, which can be implemented by a server and application nodes, such as server 10 and application node 20 shown in Figure 1. The server is responsible for distributing the trained neural network model to the application nodes. The application node includes a computing device, which can be computing device 200 shown in Figure 2. The computing device includes a security unit, which can be security unit 500 shown in Figure 5. In specific implementations, the communication between the server and the computing device is encrypted, and the communication channel can be WIFI, mobile communication, etc. As shown in Figure 7, the method includes the following steps.
[0125] In step S701, the server sends a neural network model and deployment instructions to the application node. The deployment instructions are used to indicate the neural network code and network parameters deployed in the general computing unit, and the neural network code and / or network parameters deployed in the security unit.
[0126] In this embodiment, within an application node including computing device 200, the server can instruct that a portion of the neural network model be deployed in general-purpose computing unit 210, and another portion of the neural network model be deployed in secure unit 220. Thus, the portion deployed in secure unit 220 is protected from physical attacks, while avoiding the limitation of storage space and computing speed caused by deploying the entire neural network model in secure unit 220.
[0127] In some possible implementations, the server instructs the deployment of the first neural network code and a portion of the first network parameters for the first model inference of the neural network model in the general computing unit 210, and the deployment of the remaining first network parameters for the first model inference in the secure unit 220. In this implementation, the first model inference is the complete inference of the neural network model, the first neural network code is the complete code of the neural network model, and the first network parameters are the complete network parameters of the neural network model. Performing the first model inference based on the model input can obtain the model output. The storage of a portion of the network parameters of the neural network model in the secure unit 220 can prevent these network parameters from being subjected to physical attacks.
[0128] In some possible implementations, the server can instruct the server to deploy the first neural network code and first network parameters for the first model inference of the neural network model in the general-purpose computing unit 210, and the second neural network code and second network parameters for the second model inference of the neural network model in the secure unit 220. The server can also deploy the complete neural network code and network parameters for the first model inference in the general-purpose computing unit 210, and the complete neural network code and network parameters for the second model inference in the secure unit 220. Through this implementation, the partial model inference of the neural network model is performed in the secure unit 220, which can prevent this part of the computation process from being physically attacked.
[0129] In some possible implementations, the server may instruct the server to deploy the first neural network code and a portion of the first network parameters for the first model inference of the neural network model in the general computing unit 210, the remaining first network parameters for the first model inference in the security unit 220, and the second neural network code and second network parameters for the second model inference of the neural network model. This implementation can prevent physical attacks on some network parameters and parts of the computation process.
[0130] In a specific implementation, the server can configure the first neural network code as an application that can be executed by the general computing unit 210, and configure the second neural network code as an application that can be executed by the security unit 220.
[0131] In step S702, the general computing unit of the application node receives the neural network model and deployment instructions sent by the server.
[0132] In this embodiment, within system 100, application node 20 and server 10 can communicate via a trusted channel. In specific implementations, the server can encrypt the neural network model and deployment instructions, and the general-purpose computing unit 210 can decrypt and verify the data sent by the server. Encryption, decryption, and verification are existing technologies, and will not be elaborated upon in this embodiment.
[0133] In step S703, the general computing unit deploys the neural network model in the general computing unit and the security unit according to the deployment instructions.
[0134] In this embodiment, if the server instructs that the first neural network code and a portion of the first network parameters of the first model inference of the neural network model be deployed in the general computing unit 210, and the remaining first network parameters of the first model inference be deployed in the security unit 220, then the general computing unit 210 stores the first neural network code and a portion of the first network parameters of the first model inference of the neural network model in the storage unit 230, and instructs the security unit 220 to store the remaining first network parameters of the first model inference.
[0135] If the server can instruct the deployment of the first neural network code and the first network parameters of the first model inference of the neural network model in the general computing unit 210, and the deployment of the second neural network code and the second network parameters of the second model inference of the neural network model in the security unit 220, then the general computing unit 210 stores the first neural network code and the first network parameters of the first model inference of the neural network model in the storage unit 230, and instructs the security unit 220 to store the second neural network code and the second network parameters of the second model inference.
[0136] If the server instructs the deployment of the first neural network code and a portion of the first network parameters of the first model inference of the neural network model in the general computing unit 210, the deployment of the remaining first network parameters of the first model inference in the security unit 220, and the second neural network code and second network parameters of the second model inference of the neural network model, then the general computing unit 210 stores the first neural network code and a portion of the first network parameters of the first model inference of the neural network model in the storage unit 230, and instructs the security unit 220 to store the remaining first network parameters of the first model inference, and the second neural network code and second network parameters of the second model inference.
[0137] After deploying the neural network model in the general computing unit 210 and the security unit 220 according to the deployment instructions, the general computing unit 210 can execute business logic and, when it is determined in the execution of business logic to call the neural network model, call the neural network model.
[0138] In some possible implementations, the server can also adjust the deployment location of the neural network code and / or network parameters, including general computing units and security units; and send deployment instructions based on this adjustment to application nodes. In specific implementations, the deployment location of the neural network code and / or network parameters can be adjusted during neural network model updates or periodically. This security enhancement mechanism increases the difficulty for attackers to attack the system and improves the security of the neural network model.
[0139] In this embodiment, after deploying a neural network model on the application node and computing device, the neural network model can be used to generate model output based on the model input.
[0140] In some possible implementations, when invoking a neural network model to process model inputs and obtain model outputs, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters not stored in the storage unit 230. The security unit receives the parameter read command and returns the first network parameters it stores, corresponding to the parameter read command, to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 corresponding to the parameter read command and performs the first model inference based on the received first network parameters.
[0141] In some possible implementations, when invoking a neural network model to process model inputs and obtain model outputs, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends an execution command to the security unit 220 to perform second model inference within the security unit 220. The security unit 220 receives the execution command, performs the corresponding second model inference, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. This execution result may include intermediate results or model outputs. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform first model inference based on the intermediate result to obtain model outputs. If the execution result is model output, the general-purpose computing unit 210 executes business logic based on the model output.
[0142] In some possible implementations, when invoking a neural network model to process model inputs and obtain model outputs, the general-purpose computing unit 210 performs first model inference. During the execution of the first model inference, it sends a parameter read command to the security unit 220 to obtain first network parameters not stored in the storage unit 230. The security unit receives the parameter read command and returns its stored first network parameters corresponding to the parameter read command to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the first network parameters returned by the security unit 220 corresponding to the parameter read command and performs first model inference based on the received first network parameters. During the execution of the first model inference, the general-purpose computing unit 210 also sends an execution command to the security unit 220 to perform second model inference in the security unit 220. The security unit 220 receives the execution command, performs the corresponding second model inference, and returns the execution result generated by the second model inference to the general-purpose computing unit 210. The general-purpose computing unit 210 receives the execution result returned by the security unit 220. The execution result may include intermediate results or model outputs. If the execution result is an intermediate result, the general-purpose computing unit 210 continues to perform first model inference based on the intermediate result to obtain model outputs. If the execution result is the model output, the general computing unit 210 executes the business logic based on the model output.
[0143] This application also provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the above-described method.
[0144] This application embodiment also provides an application node, including a device body and the aforementioned computing device or security unit disposed on the device body. The application node may include, but is not limited to: personal computers (PCs), such as laptops and desktop computers; mobile devices, such as smartphones and tablets; embedded systems, such as smart cameras, drones, and autonomous vehicles; Internet of Things (IoT) devices, such as various sensors and smart devices, such as smart home devices; edge computing devices, such as edge servers and gateways; and wearable devices, such as smartwatches and health monitoring devices.
[0145] This application embodiment also provides a server, which is used to: send a neural network model and deployment instructions to an application node, wherein the deployment instructions are used to indicate the neural network code and network parameters deployed in a general computing unit, and the neural network code and / or network parameters deployed in a security unit.
[0146] In some possible implementations, the server is also used to: adjust the deployment location of neural network code and / or network parameters, including general computing units and security units; and send deployment instructions based on the adjustment to application nodes.
[0147] In practice, a server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms, etc.
[0148] The above are merely preferred embodiments of this application and are not intended to limit this application in any way. Although this application has disclosed preferred embodiments as above, it is not intended to limit this application. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the technical solution of this application. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of this application without departing from the scope of the technical solution of this application shall still fall within the scope of the technical solution of this application.
Claims
1. A computing device, comprising: A storage unit for storing first neural network code for first model inference of a neural network model and at least a portion of first network parameters for first model inference; A security unit is used to: store the remaining first network parameters for the inference of the first model; And / or store the second neural network code and the second network parameters for the second model inference of the neural network model, and perform the second model inference based on the second network parameters and the second neural network code; A general-purpose computing unit is used to perform the first model inference based on the first neural network code and the first network parameters.
2. The computing device as claimed in claim 1, wherein, The general-purpose computing unit is also used for: The system receives a neural network model and deployment instructions sent by a server. The deployment instructions are used to indicate the neural network code and network parameters deployed in the general computing unit, and the neural network code and / or network parameters deployed in the security unit. The neural network model is deployed in the general computing unit and the security unit according to the deployment instructions.
3. The computing device as claimed in claim 1, wherein, The security unit is also used for: Receives the parameter read command from the general computing unit and returns the first network parameter stored in itself and corresponding to the parameter read command to the general computing unit; and / or The system receives the execution command from the general computing unit, performs corresponding second model inference based on the second neural network code and second network parameters corresponding to the execution command, and returns the execution result to the general computing unit, the execution result including model output or intermediate result.
4. The computing device as claimed in claim 1, wherein, The general-purpose computing unit is also used for: Send a parameter read command to the security unit, and receive the first network parameters corresponding to the parameter read command returned by the security unit; and / or Send an execution command to the security unit and receive the execution result returned by the security unit after executing the second model inference corresponding to the execution command. The execution result includes model output or intermediate result.
5. The application node as described in claim 1, wherein, The first network parameters stored in the security unit include at least a portion of the network parameters of at least one neural network layer in the inference of the first model; and / or The second neural network code includes neural network code for at least one neural network layer.
6. The computing device according to any one of claims 1 to 5, wherein, The general computing unit and the storage unit are located in a trusted execution environment, which performs the first model inference.
7. The computing device according to any one of claims 1 to 5, wherein, The general-purpose computing unit is equipped with a virtual machine, which executes the first model inference.
8. A security unit, comprising: A secure storage module is used to store a portion of the first network parameters from the first model inference of the neural network model; And / or store the second neural network code and second network parameters for the second model inference of the neural network model; A secure computing module is used to provide the stored first network parameters to the general-purpose computing unit; And / or perform the second model inference based on the second network parameters and the second neural network code.
9. The security unit as claimed in claim 8, wherein, The secure computing module is also used for: Receives the parameter read command from the general computing unit and returns the first network parameter stored in itself and corresponding to the parameter read command to the general computing unit; and / or The system receives the execution command from the general computing unit, performs corresponding second model inference based on the second neural network code and second network parameters corresponding to the execution command, and returns the execution result to the general computing unit, the execution result including model output or intermediate result.
10. The security unit as claimed in claim 8, wherein, The first network parameters stored in the security unit include at least a portion of the network parameters of at least one neural network layer in the inference of the first model; and / or The second neural network code includes neural network code for at least one neural network layer.
11. A method for processing a neural network model, applied to a security unit, the method comprising: The storage includes a portion of the first network parameters for the first model inference of the neural network model, and / or the second neural network code and second network parameters for the second model inference of the neural network model; Provide the stored first network parameters to the general-purpose computing unit; And / or perform the second model inference based on the second network parameters and the second neural network code, and return the execution result to the general computing unit, the execution result including model output or intermediate result.
12. The method of claim 11, wherein, The provision of the stored first network parameters to the general-purpose computing unit includes: The system receives a parameter read command from the general computing unit and returns to the general computing unit the first network parameter stored therein that corresponds to the parameter read command.
13. The method of claim 11, wherein, The step of executing the second model inference based on the second network parameters and the second neural network code, and returning the execution result to the general computing unit, includes: Receive the execution command from the general computing unit; Based on the second neural network code and second network parameters corresponding to the execution command, perform corresponding second model inference to obtain the execution result; The execution result is returned to the general computing unit.
14. The method of claim 11, wherein, The first network parameters include at least a portion of the network parameters of at least one neural network layer of the first model inference; and / or The second neural network code includes neural network code for at least one neural network layer.
15. A method for processing a neural network model, applied to a general-purpose computing unit, the method comprising: The system receives a neural network model and deployment instructions sent by a server. The deployment instructions are used to indicate the neural network code and network parameters deployed in the general computing unit, and the neural network code and / or network parameters deployed in the security unit. The neural network model is deployed in the general computing unit and the security unit according to the deployment instructions.
16. The method of claim 15, wherein, Deploying the neural network model in the general computing unit and the security unit according to the deployment instructions includes: The storage unit stores the first neural network code for the first model inference of the neural network model and at least a portion of the first network parameters for the first model inference; The security unit is instructed to store the remaining first network parameters of the first model inference and / or store the second neural network code and second network parameters of the second model inference of the neural network model.
17. The method of claim 16, wherein, Also includes: Send a parameter read command to the security unit and receive the first network parameters corresponding to the parameter read command returned by the security unit; and / or Send an execution command to the security unit and receive the execution result returned by the security unit after executing the second model inference corresponding to the execution command.
18. The method of claim 15, wherein, The network parameters deployed in the security unit include at least a portion of the network parameters of at least one neural network layer; and / or The neural network code deployed in the security unit includes code for at least one network layer.
19. A method for processing a neural network model, wherein, Applied to a server, the method includes: Send neural network models and deployment instructions to application nodes. The deployment instructions are used to indicate the neural network code and network parameters deployed in the general computing unit, and the neural network code and / or network parameters deployed in the security unit.
20. The method of claim 19, wherein, Also includes: Adjust the deployment location of the neural network code and / or network parameters, wherein the deployment location includes the general computing unit and the security unit; Send deployment instructions based on the adjustments to the application nodes.
21. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 11-20.
22. An application node, wherein, It includes a main body of the device and a computing device as described in any one of claims 1 to 7 or a security unit as described in any one of claims 8 to 10, disposed on the main body of the device.