Outsourcing deep learning system supporting privacy protection

By combining a data obfuscation module and an untrusted GPU device in a trusted execution environment, and utilizing random sparse matrices for data desensitization and gradient synchronization updates, the problems of long computation time and privacy leakage in existing technologies are solved, achieving efficient privacy protection and precision loss control.

CN115587379BActive Publication Date: 2026-06-23SHANGHAI JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI JIAOTONG UNIV
Filing Date
2022-11-11
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies for protecting the privacy of deep learning model data suffer from drawbacks such as long computation time, lack of support for division operations, increased noise in ciphertext, and potential privacy leaks and data misuse.

Method used

Deploy the data obfuscation module in a trusted execution environment, use a random sparse matrix to de-identify the data, and train the neural network on an untrusted GPU device. Update the data obfuscation module synchronously with gradient information to reduce the number of data interactions.

Benefits of technology

While ensuring security, it improved training efficiency and accuracy, reduced the number of interface interactions, achieved an accuracy loss of less than 5%, and improved training efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115587379B_ABST
    Figure CN115587379B_ABST
Patent Text Reader

Abstract

A kind of outsourcing deep learning system supporting privacy protection, data encryption module and data decryption module carry out data encryption and decryption according to the public key and private key in the generated public-private key pair respectively, data obfuscation module carries out desensitization processing to the decryption information from data decryption module and then outputs desensitized data to neural network calculation module, neural network calculation module carries out training and testing of neural network model according to desensitized data, obtains obfuscated gradient parameter in the process of neural network back propagation and returns to data obfuscation module for synchronous update, improves the convergence speed and accuracy of the whole training system.The present application realizes data privacy protection in the trusted execution environment, outsources to cloud server for efficient deep learning model training under the condition of guaranteeing integrity and confidentiality, so that users can confuse and disturb in the feasible execution environment, share desensitized data, and the original data and confusion process are invisible to attackers.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a technology in the field of information security, specifically an outsourced deep learning system that supports privacy protection. Background Technology

[0002] Currently, solutions for protecting deep learning model data privacy based on trusted execution environments generally fall into two categories. One is to completely port the model into a trusted execution environment (such as an Intel SGX enclave). Its advantage lies in ensuring computational confidentiality without sacrificing model accuracy. However, trusted environments typically have limited memory, and frequent paging leads to low computational performance. The other approach is partial model porting. However, existing work often involves dozens or even hundreds of interface interactions between the SGX enclave and the model for complex models, resulting in significant performance overhead. Summary of the Invention

[0003] This invention addresses the shortcomings of existing technologies, such as long computation time, lack of support for division operations, increased noise in encrypted data, and problems of privacy leakage and data misuse. It proposes an outsourced deep learning system that supports privacy protection. Data privacy is protected within a trusted execution environment, while the training of deep learning models is outsourced to a cloud server while ensuring integrity and confidentiality. This allows users to obfuscate and scramble local data in a feasible execution environment and share the anonymized data. The original data and the obfuscation process are invisible to attackers.

[0004] This invention is achieved through the following technical solution:

[0005] This invention relates to an outsourced deep learning system that supports privacy protection, comprising: a data encryption module, a data decryption module and a data obfuscation module located in a trusted execution environment, and a neural network computing module located in an untrusted GPU device. The data encryption and decryption modules encrypt and decrypt data respectively using the public and private keys in a generated public-private key pair. The data obfuscation module desensitizes the decrypted information from the data decryption module and outputs the desensitized data to the neural network computing module. The neural network computing module trains and tests the neural network model based on the desensitized data. During the backpropagation of the neural network, obfuscated gradient parameters are obtained and sent back to the data obfuscation module for synchronous updates, thereby improving the convergence speed and accuracy of the entire training system.

[0006] The aforementioned data masking process refers to: constructing and initializing a random sparse matrix in the data obfuscation module of the trusted execution environment; the data masking process is essentially a linear transformation of the original data X. i Obfuscated output Wherein, matrix W is a randomly generated sparse matrix. When initializing the random sparse matrix, it is also necessary to ensure that at least one element in each row and each column is not zero, that is, each element of the original data can be linearly mapped at least once.

[0007] Technical effect

[0008] This invention achieves information hiding by deploying a data obfuscation module based on a random sparse matrix in a trusted execution environment (TEA). Simultaneously, it uses a neural network training framework located in an untrusted GPU device within the TEA, updating the data obfuscation module in the TEA based on the neural network's gradient information. Compared to existing technologies, this invention achieves less than 5% accuracy loss while sacrificing controllable precision, and limits data interaction between the TEA and the environment to two times per neural network iteration, thus improving training efficiency. Attached Figure Description

[0009] Figure 1 This is a schematic diagram of the system of the present invention;

[0010] Figure 2 This is a schematic diagram illustrating the data anonymization process in an example.

[0011] Figure 3 This is a schematic diagram illustrating the synchronous update of the data obfuscation module in an embodiment.

[0012] Figure 4 This is a schematic diagram of parameter passing in the embodiment. Detailed Implementation

[0013] like Figure 1 As shown in the figure, this embodiment relates to an outsourced deep learning system that supports privacy protection, including: a data encryption module, a data decryption module and a data obfuscation module located in a trusted execution environment, and a neural network computing module located in an untrusted GPU device. The data encryption module and the data decryption module encrypt and decrypt data according to the public key and private key in the generated public-private key pair, respectively. The data obfuscation module desensitizes the decrypted information from the data decryption module and outputs the desensitized data to the neural network computing module. The neural network computing module trains and tests the neural network model according to the desensitized data, obtains the obfuscated gradient parameters, and outputs them to the data obfuscation module for synchronous updates, thereby improving the convergence speed and accuracy of the entire training system.

[0014] After initialization, the data obfuscation module performs a linear transformation on the original data using a random sparse matrix during forward propagation to obtain desensitized data, which is then output to the neural network computing module in the GPU. During backward propagation, it receives obfuscation gradient parameters from the neural network computing module and synchronously updates the random sparse matrix units.

[0015] The neural network computing module employs various deep neural network models, including convolutional neural networks and graph neural networks. During forward propagation, it receives desensitized data from the data obfuscation module as network input, and during backward propagation, it sends obfuscation gradient parameters to the data obfuscation module.

[0016] like Figure 1 and Figure 2 As shown, the data anonymization process refers to: constructing and initializing a random sparse matrix in the data obfuscation module of the trusted execution environment; the data anonymization process is essentially a linear transformation of the original data X. i Obfuscated output Wherein, matrix W is a randomly generated sparse matrix. When initializing the random sparse matrix, it is also necessary to ensure that at least one element in each row and each column is not zero, that is, each element of the original data can be linearly mapped at least once.

[0017] The sparsity of the random sparse matrix is ​​specified by the user, and a sparsity of 0.5 means that 50% of the elements in each column are zero.

[0018] The desensitized data, i.e. the original data obtained after decryption, is input into the neural network computing module for training according to the gradient descent algorithm. When the neural network is updated in reverse, the data obfuscation module located in the trusted execution environment receives the gradient values ​​and weight values ​​from the first layer of the model and updates the data obfuscation module synchronously. This ensures that in a single model training, due to the different initial non-zero positions, the sparse matrix obtained after training and the neural network model structure located in the untrusted GPU device are different, which can effectively resist differential attacks.

[0019] The synchronous update only updates the positions of non-zero elements in the sparse matrix, while the position of zero elements remains zero.

[0020] The random sparse matrix is ​​equivalent to a generated random key. The original data is linearly transformed and mapped to randomized data of a specified size using this key. The storage of the key and the mapping process of the data are both protected by a trusted execution environment.

[0021] like Figure 3 As shown, the synchronous update refers to: when the neural network computing module updates using the gradient descent algorithm, the gradient information δ in each round of backward iteration of the neural network... 0 = (A)Tδ1, where: A ij Let δ1 be the gradient of the input layer of the neural network in the untrusted GPU device, and δ1 be the random sparse matrix in the data obfuscation module. After calculating the gradient of the random sparse matrix, the non-zero elements in the random sparse matrix are updated as follows:

[0022] like Figure 4 As shown, each iteration of neural network training involves a round-trip parameter transfer between the TEE and the GPU to achieve synchronous updates, specifically including:

[0023] 1) The data obfuscation module within the TEE completes the obfuscation transformation of the original data, converting the desensitized data X... i Output is sent to the neural network computation module located in an untrusted GPU device;

[0024] 2) The neural network computation module completes one round of forward and backward propagation, and simultaneously calculates the input layer gradient δ1;

[0025] 3) The neural network calculation module outputs the input layer gradient δ1 to the data obfuscation module within the TEE, and completes the synchronous update of the data obfuscation module within the TEE;

[0026] 4) Conduct the next round of training, repeating steps 1)-3).

[0027] Through specific practical experiments, an Intel SGX 2.8 was used as the trusted execution environment for the implementation. The neural network module, located on an untrusted GPU device, was implemented using PyTorch C++ with a 2.3GHz Intel Core i5 processor and an NVIDIA GeForce GTX 1070 GPU. MNIST and CIFAR were selected as experimental datasets, and LeNet and ResNet20 / ResNet32 were used as neural networks on the untrusted GPU device for testing. The experimental data showed that, compared to a pure neural network without data obfuscation, this approach resulted in a 1% decrease in accuracy for the MNIST dataset and a 4% decrease for the CIFAR dataset. The convergence speed did not change significantly, and both losses were considered acceptable. Furthermore, this approach used the Pearson correlation coefficient to quantify the correlation between the original and anonymized data. The correlation coefficient between the original and anonymized data in this approach was 0.168, which is approximately equivalent to the information anonymization provided by Laplace noise with a privacy parameter of 0.7 in differential privacy techniques.

[0028] Compared to TensorSCONE: TensorSCONE fully migrates machine learning models to the SGX secure zone, enabling secure model training and prediction on untrusted infrastructure. Despite optimizing the TensorSCONE architecture, the limited EPC size in SGX results in relatively low computational performance, down 30% compared to Tensorflow Lite. This invention, however, chooses to train the neural network on the GPU while performing obfuscation and desensitization of the original data in a trusted execution environment, thus shortening the model training time while maintaining security.

[0029] TensorSCONE, by porting the entire system, ensured security without sacrificing model accuracy. In contrast, this invention obfuscates the original data, achieving data desensitization at the expense of controllable accuracy. In experiments, this invention resulted in approximately 1% accuracy loss on the Minst dataset and approximately 4% accuracy loss on the more complex CIFAR dataset.

[0030] Comparison with Occlum: Occlum, like TEE (Trusted Execution Environment) and GPU computing systems, differs in its effective separation of linear and nonlinear layer computations within the neural network between trusted and untrusted devices, placing nonlinear layers in a secure region. The drawback of this approach is that complex deep neural networks contain dozens or even hundreds of pairs of "linear layer + nonlinear layer" structures, each involving interface interactions between the TEE and the outside during forward and backward propagation. This invention, however, adds a data obfuscation layer only before the neural network located on the untrusted GPU device, effectively eliminating only one interface interaction during forward and backward propagation, thus improving overall training efficiency. When switching between secure and unsecured regions, Occlum uses additive perturbation to desensitize the data; this invention employs matrix penalty, i.e., linear transformation, to add perturbations, resulting in higher security.

[0031] Occlum involves encryption and decryption operations when switching between secure zones and sub-secure zones, which generally does not affect the model accuracy. However, this invention requires sacrificing controllable model accuracy.

[0032] Compared with existing technologies, this invention uses a data desensitization algorithm based on random sparse matrices. After obfuscation and folding, the original data has little correlation with the original data, and attackers cannot obtain the original data information from the data returned by the TEE. On the other hand, this invention only involves two interactions between the trusted execution environment and the GPU in each neural network iteration, reducing the performance loss caused by interface interactions. In other similar schemes, the number of interface interactions is proportional to the complexity of the neural network.

[0033] The above-described specific implementations can be partially adjusted by those skilled in the art in different ways without departing from the principles and purpose of the present invention. The scope of protection of the present invention is defined by the claims and is not limited to the above-described specific implementations. All implementation schemes within the scope of the claims are bound by the present invention.

Claims

1. A privacy-preserving outsourced deep learning system, characterized in that, include: The system includes a data encryption module, a data decryption module and a data obfuscation module located in a trusted execution environment, and a neural network computing module located in an untrusted GPU device. Specifically, the data encryption module and the data decryption module encrypt and decrypt data based on the public and private keys in the generated public-private key pair, respectively. The data obfuscation module desensitizes the decrypted information from the data decryption module and outputs the desensitized data to the neural network computing module. The neural network computing module trains and tests the neural network model based on the desensitized data. During the backpropagation of the neural network, the obfuscated gradient parameters are obtained and sent back to the data obfuscation module for synchronous updates, thereby improving the convergence speed and accuracy of the entire training system. The aforementioned data masking process refers to: constructing and initializing a random sparse matrix in the data obfuscation module of the trusted execution environment; the data masking process is essentially a linear transformation of the original data X. i Obfuscated output Wherein: matrix W is a randomly generated sparse matrix. When initializing the random sparse matrix, it is also necessary to ensure that at least one element in each row and each column is not zero, that is, each element of the original data can be linearly mapped at least once.

2. The privacy-preserving outsourced deep learning system according to claim 1, characterized in that, After initialization, the data obfuscation module performs a linear transformation on the original data using a random sparse matrix during forward propagation to obtain desensitized data, which is then output to the neural network computing module in the GPU. During backward propagation, it receives obfuscation gradient parameters from the neural network computing module and synchronously updates the random sparse matrix units.

3. The privacy-preserving outsourced deep learning system according to claim 1 or 2, characterized in that, The neural network computing module employs various deep neural network models, including convolutional neural networks and graph neural networks. During forward propagation, it receives desensitized data from the data obfuscation module as network input, and during backward propagation, it sends obfuscation gradient parameters to the data obfuscation module.

4. The privacy-preserving outsourced deep learning system according to claim 1, characterized in that, The desensitized data, i.e. the original data obtained after decryption, is input into the neural network computing module for training according to the gradient descent algorithm. When the neural network is updated in reverse, the data obfuscation module located in the trusted execution environment receives the gradient values ​​and weight values ​​from the first layer of the model and updates the data obfuscation module synchronously. This ensures that in a single model training, due to the different initial non-zero positions, the sparse matrix obtained after training and the neural network model structure located in the untrusted GPU device are different, which can effectively resist differential attacks.

5. The privacy-preserving outsourced deep learning system according to claim 1, characterized in that, The synchronous update described above only updates the positions of non-zero elements in the sparse matrix; the position of the zero element remains zero. Specifically, when the neural network computation module uses the gradient descent algorithm for updating, the gradient information δ in each round of backward iteration of the neural network is updated. 0 =(A) T δ1, where: A ij Let δ1 be the gradient of the input layer of the neural network in the untrusted GPU device, and δ1 be the random sparse matrix in the data obfuscation module. After calculating the gradient of the random sparse matrix, the non-zero elements in the random sparse matrix are updated as follows:

6. The privacy-preserving outsourced deep learning system according to claim 1, characterized in that, Each iteration of neural network training involves a round trip of parameter transfer between the TEE and the GPU to achieve synchronous updates, specifically including: 1) The data obfuscation module within the TEE completes the obfuscation transformation of the original data, converting the desensitized data X... i Output is sent to the neural network computation module located in an untrusted GPU device; 2) The neural network computation module completes one round of forward and backward propagation, and simultaneously calculates the input layer gradient δ1; 3) The neural network calculation module outputs the input layer gradient δ1 to the data obfuscation module within the TEE, and completes the synchronous update of the data obfuscation module within the TEE; 4) Conduct the next round of training, repeating steps 1)-3).