A data privacy protection method of a non-interactive online medical pre-diagnosis system
By constructing a privacy transmission system that includes a trusted third-party institution, a group of data providers, an auxiliary cloud server, and a training cloud server, the problems of high communication volume and inference attacks in online medical pre-diagnosis systems are solved. This achieves non-interactive data privacy protection, is suitable for vertically partitioned data scenarios, and improves the security and efficiency of the model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUILIN UNIV OF ELECTRONIC TECH
- Filing Date
- 2023-03-27
- Publication Date
- 2026-06-12
AI Technical Summary
Existing online medical pre-diagnosis systems face challenges in building machine learning models, including high communication volumes, the requirement for real-time online data providers, and the inability to defend against inference attacks. In particular, they struggle to achieve effective privacy protection in scenarios with vertically partitioned data.
A privacy transmission system is constructed, comprising a trusted third-party institution, a group of data providers, an auxiliary cloud server, and a training cloud server. Non-interactive data privacy protection is achieved through gradient update formula transformation, global data matrix generation, local data matrix encryption, and global model parameter updates.
In vertically partitioned data scenarios, data privacy is protected, computational and communication burdens are reduced, inference attacks are prevented, and the security of user data and the accuracy of models are ensured.
Smart Images

Figure CN116522377B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of privacy computing technology, specifically to a data privacy protection method for a non-interactive online medical pre-diagnosis system. Background Technology
[0002] With the rapid development of the smart healthcare industry and people's increasing emphasis on their health, online medical pre-diagnosis systems have become an effective means for individuals to pre-diagnose their own illnesses. Medical users can send consultation requests to remote online medical pre-diagnosis systems through client applications and receive diagnostic results that support the selection of further medical interventions. This provides users with scientific treatment guidance, helping to make rational use of medical resources and determine the correct department and disease category.
[0003] The core of an online medical pre-diagnosis system is to build a machine learning model based on existing medical data. This model can then predict whether a user's unknown samples indicate a certain disease. The accuracy of the diagnosis depends on the reliability of the machine model. Building a high-precision machine learning model relies on a large amount of valid data samples. However, medical data is usually stored in a decentralized manner and is sensitive to privacy, making the construction of machine learning-based diagnostic systems extremely challenging. Furthermore, privacy-preserving machine learning (PPML) models are also difficult to build.
[0004] Due to limitations in cryptographic techniques and the complexity of the model, the currently constructed PPML scheme still has certain limitations:
[0005] 1) The medical data involved in online medical pre-diagnosis systems is highly private. Many existing systems (such as federated learning-based solutions) have weak security and cannot effectively resist inference attacks. The method of using differential privacy to increase noise to resist inference attacks suffers from reduced model accuracy due to data perturbation, which cannot meet the goal of precision medicine in pre-diagnosis systems.
[0006] 2) Medical data is highly dispersed. The establishment of machine learning models requires the aggregation of data samples from various parties to form many secure multi-party computation (PPML) frameworks. However, these frameworks are interactive, requiring data providers to be online in real time, which lacks flexibility. At the same time, they place higher demands on network bandwidth and reliability, which is not conducive to deployment in real-world scenarios.
[0007] 3) Most existing PPML solutions are designed for horizontally partitioned data and are not suitable for vertically partitioned data. Technically, the distribution of vertically partitioned data requires both secure aggregation at the sample dimension and secure aggregation at the attribute dimension. Summary of the Invention
[0008] The purpose of this invention is to provide a data privacy protection method for a non-interactive online medical pre-diagnosis system, which solves the technical problems of high communication volume, real-time online requirement for data providers, and inability to resist inference attacks in existing privacy-preserving online medical pre-diagnosis systems.
[0009] To achieve the above objectives, the present invention provides a data privacy protection method for a non-interactive online medical pre-diagnosis system, comprising the following steps:
[0010] Building a system that includes trusted third-party institutions Data Provider Group Auxiliary cloud server and training cloud server Privacy transmission system;
[0011] Gradient update formula transformation;
[0012] Global data matrix generation;
[0013] Local data matrix encryption;
[0014] Global model parameters updated.
[0015] Optionally, the third-party trusted institution Responsible for building the underlying cryptographic system, transmitting public keys to other entities, and providing them to auxiliary cloud servers. Provide private key services;
[0016] The data provider group It includes several data providers, among which Each data provider possesses a set of partial data attributes and preprocesses these private attributes to generate a local data matrix. Using inner product functions and random masking techniques, the local data matrix is encrypted into matrix ciphertext and uploaded to an auxiliary cloud server. ;
[0017] The auxiliary cloud server For privacy-preserving aggregation, through a trusted third-party institution. Sending function-derived key Decrypt the ciphertext of the local data matrix to obtain a blinded global data matrix;
[0018] The training cloud server Through a trusted third-party organization Inverse of the sent blinding factor To decrypt the blinded global data matrix and recover the true result of the aggregated global data matrix.
[0019] Optionally, the gradient update formula transformation process includes the following steps:
[0020] Step 1: Use least squares approximation in the interval The above is about the logistic function Approximating, we obtain an approximate form of the loss function:
[0021] ,
[0022] in, , , ;
[0023] Step 2: Adjust the parameters respectively Taking the partial derivative of each component yields an approximate form of the gradient update formula:
[0024]
[0025] in, It is the current iteration number. ;
[0026] Step 3: Using matrix calculations and summation formulas, transform the mathematical form of the gradient update formula to obtain a gradient update formula that decouples user data and model parameters:
[0027]
[0028] Optionally, during the generation of the global data matrix, the data provider group The local medical data is processed uniformly to generate a local data matrix of the same dimension. Specifically, each local vertical medical data sample is filled into a dimension vector with the feature incremented by 1, and then the local data matrix is calculated using the dimension vector, ultimately forming a global data matrix.
[0029] Optionally, the process of encrypting the local data matrix includes the following steps:
[0030] System initialization;
[0031] Data encryption;
[0032] Data decryption.
[0033] Optionally, the system initialization task is to generate the relevant parameters for the inner product function encryption system. The system initialization process includes the following steps:
[0034] Step 1: By a trusted third-party organization Execute, and set security parameters As input, and generate mathematical groups and ,in Defined as the number of samples, a public key is generated from the master key. and private key ;
[0035] Step 2: Trusted Third-Party Institution Randomly select a one-time blinding factor And calculate its inverse. ;
[0036] Step 3: Trusted Third-Party Institution public key Assigned to One-time blinding factor Assigned to And Distributed to training cloud server During the training phase, a trusted third-party organization Will send to auxiliary cloud server Provide private key services, among which , They are from different data providers.
[0037] Optionally, during the data encryption process, Through public key Encrypt local data. Through a one-time blinding factor Encrypt local data.
[0038] Optionally, the data decryption process includes the following steps:
[0039] Step 1: Trusted Third-Party Institution According to Sent Generate function derived key and the key Distributed to auxiliary cloud servers ;
[0040] Step 2: Assisting the cloud server ciphertext Function-derived key As input, and with Return the discrete logarithm at base
[0041]
[0042] Then, auxiliary cloud server Will Upload to training cloud server ;
[0043] Step 3: Upon receiving Then, training cloud server use calculate
[0044] ;
[0045] Step 4: Train the cloud server by performing the above operations on each element in the matrix. A matrix can be obtained:
[0046] ;
[0047] Furthermore, to simplify the notation, use Representation matrix elements in ,use Representation matrix elements in , , , matrix Represented as
[0048] .
[0049] Optionally, the global model parameter update process includes the following steps:
[0050] Step 1: Training the cloud server Initialize learning rate Maximum number of iterations Convergence threshold and randomly initialized model parameters ;
[0051] Step 2: Training the cloud server From the matrix Extract , in , ;
[0052] Step 3: Train the cloud server Updating model parameters through calculation
[0053]
[0054] in, ;
[0055] Step 4: Training the cloud server Repeat steps 2 and 3 until the maximum number of iterations is reached. Or the parameters converge to the convergence threshold. And finally obtain the global model parameters. .
[0056] This invention provides a data privacy protection method for a non-interactive online medical pre-diagnosis system. For complex vertically partitioned medical data scenarios, a PPML model for protecting user data is constructed. Unlike traditional PPML schemes, the data provider group preprocesses the local vertical medical dataset, transforming local data attributes into a data matrix. This matrix is then encrypted using an inner product function and a random blind factor before being uploaded to a cloud server and immediately taken offline, enabling non-interactive model training. Simultaneously, an auxiliary cloud server is introduced for privacy-preserving data aggregation, while the training cloud server is responsible for training the LR model. There is only one one-way, single-round communication between the two. Furthermore, since the auxiliary cloud server obtains the encrypted aggregation result, while the training cloud server obtains the global model trained using integrated data, individual user information cannot be obtained through inference attacks, thus protecting privacy. Attached Figure Description
[0057] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0058] Figure 1 This is a schematic diagram of the structure of horizontally partitioned medical data in an online medical pre-diagnosis system.
[0059] Figure 2 This is a schematic diagram of the structure of vertically partitioned medical data in an online medical pre-diagnosis system.
[0060] Figure 3 This is a flowchart illustrating a data privacy protection method for a non-interactive online medical pre-diagnosis system according to the present invention.
[0061] Figure 4 This is a schematic diagram of the privacy transmission system model of the present invention. Detailed Implementation
[0062] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.
[0063] The following provides further explanation in conjunction with the background technology and execution steps:
[0064] Due to the diversity of medical data, different organizations possess samples with two distribution patterns, such as... Figure 1 and Figure 2 As shown. In Figure 1 In the horizontal partitioning scenario shown, Hospital A and Hospital B each hold complete patient medical data, including glucose, BMI, age, insulin levels, and class labels (positive / negative). Apart from the identity identifiers, there is no overlap between the local medical datasets of Hospitals A and B. In the vertical partitioning scenario, as... Figure 2 As shown, Hospital A possesses two of the patient's attributes, namely glucose and BMI, while Hospital B possesses two other attributes, namely age and insulin level. Classification labels belong to Hospital A, Hospital B, or both; a complete dataset can only be formed after point-to-point communication and integration between the two hospitals.
[0065] To achieve a non-interactive online medical pre-diagnosis system resistant to inference attacks and based on vertical datasets, this invention provides a data privacy protection method for a non-interactive online medical pre-diagnosis system, comprising the following steps:
[0066] S1: Building a system that includes trusted third-party institutions Data Provider Group Auxiliary cloud server and training cloud server Privacy transmission system;
[0067] S2: Gradient update formula transformation;
[0068] S3: Global data matrix generation;
[0069] S4: Local data matrix encryption;
[0070] S5: Global model parameter update.
[0071] The basic idea behind the specific plan is: data provider First, a local data matrix is generated locally, and then verified through a trusted third-party organization. The published public key and blinding factor are used to encrypt the local matrix, and the ciphertext is uploaded to the auxiliary cloud server through a single one-way communication. Then you can go offline. (Auxiliary cloud server) Received from After uploading the encrypted text, use The distributed function-derived key is used for decryption to obtain a data matrix hidden by a random blinding factor. The decryption result is transmitted to the training server via a one-way, single communication. He will no longer participate in training missions after that. use The distributed random blinding factor recovers the global data matrix, which is then used for training the global model parameters. The detailed process of this invention is as follows: Figure 3 As shown.
[0072] For further details, please refer to Figure 4 The privacy transmission system model is shown in the figure. It mainly includes four types of entities, namely: (1) trusted third-party institutions ( (2) Data Provider Group ( , (3) Auxiliary cloud server ( (4) Training cloud server ( ).
[0073] (1) It is a trusted third-party institution, such as a government agency or other influential organization. Responsible for building the underlying cryptographic system, transmitting public keys to other entities, and sending them to the server. Provides private key services.
[0074] (2) Each It possesses a set of partial data attributes and preprocesses its private attributes to generate a local data matrix. Using inner product function encryption and random masking techniques, the local data matrix is encrypted into matrix ciphertext and uploaded. .
[0075] (3) It is an auxiliary cloud server used for privacy-preserving aggregation. It achieves this through... Sending function-derived key Decrypt the ciphertext of the local data matrix to obtain a blinded global data matrix.
[0076] (4) It is a powerful logistic regression (LR) training server with robust storage and computing capabilities. It utilizes... Inverse of the sent blinding factor This is used to decrypt the blinded global data matrix to recover the true result of the aggregated global data matrix. This matrix is used to train an LR model in the plaintext domain.
[0077] Specifically, in this embodiment of the invention, it is assumed that there are two data providers. , The LR model is jointly trained on a vertically partitioned medical dataset. It is assumed that... One sample and A dataset composed of features Is it by attribute? and Divide between. Having an attribute matrix ,in , , yes The number of attributes it possesses. Having an attribute matrix and tags ,in , yes Number of attributes possessed .
[0078] In step S2, the gradient update formula transformation process includes the following steps:
[0079] Step 2.1: Use least squares approximation in the interval The above is about the logistic function Approximating, we obtain an approximate form of the loss function:
[0080] ,
[0081] in, , , ;
[0082] Step 2.2: For each parameter... Taking the partial derivative of each component yields an approximate form of the gradient update formula:
[0083]
[0084] in, It is the current iteration number. ;
[0085] Step 2.3: Using matrix calculations and summation formulas, transform the mathematical form of the gradient update formula to obtain the gradient update formula that decouples user data and model parameters:
[0086]
[0087] In step S3, the global data matrix is generated as follows:
[0088] Includes local data matrix generation:
[0089] Data Provider Sample each of its local vertical medical data , Fill with dimensional vector
[0090] .
[0091] Then, Using fill vector Calculate the local data matrix:
[0092] .
[0093] Similarly, data providers Sample each of its local vertical medical data and the corresponding tags , Fill with dimensional vector
[0094] .
[0095] Then, Fill vector Calculate the local data matrix:
[0096] .
[0097] Step S4: The process of encrypting the local data matrix includes the following steps:
[0098] (1) System initialization: The task of the initialization phase is to generate the relevant parameters for the inner product function encryption system. To protect data samples and local gradients, this invention uses both public keys and random blinding factors to encrypt data. The initialization steps are as follows.
[0099] Step 1: The system consists of Execute to set security parameters As input, and generate mathematical groups and ,in Defined as the number of samples. A public key is generated from the master key. and private key .
[0100] Step 2: Randomly select a one-time blinding factor And calculate its inverse. .
[0101] Step 3: public key Assigned to ,Will Assigned to And Distributed to During the training phase, Will send to the server Provides private key services.
[0102] (2) Data encryption:
[0103] In this invention, , The local data needs to be encrypted in two different ways: ① via The system's public key ; ② Through random blinding factor The detailed encryption process is as follows:
[0104] In the preprocessing stage, use One sample generated A local data matrix. For the matrix of lines and Elements in the column , From matrix Obtain dimensional vector .therefore, It can be obtained from all matrices vectors. For each vector , use Sent Encrypt it to obtain the ciphertext vector. And upload it to .
[0105] and similar, From matrix Obtain vectors For each vector , Use from Sent To calculate and will Upload to .
[0106] (3) Data decryption:
[0107] At this stage, the decryption process is mainly divided into two phases: ① The first phase occurs during... In, it is used to decrypt the pseudo inner product ( (It is through blinding); ② The second stage is in Generated in order to restore and The true inner product:
[0108] Step 1: According to Sent Generate function derived key and will Distribute to .
[0109] Step 2: Server ciphertext Function-derived key As input, and with Return the discrete logarithm at base
[0110] .
[0111] Then, Will Upload .
[0112] Step 3: Upon receiving back, use calculate
[0113] .
[0114] Step 4: Perform the above operation on each element in the matrix. A matrix can be obtained
[0115]
[0116] To simplify the notation, use Representation matrix elements in ,use Representation matrix elements in , , .therefore, It can be represented as
[0117] .
[0118] Step S5: Global model parameter update.
[0119] The process of updating global model parameters includes the following steps:
[0120] Step 1: Training the cloud server Initialize learning rate Maximum number of iterations Convergence threshold and randomly initialized model parameters ;
[0121] Step 2: Training the cloud server From the matrix Extract , in , ;
[0122] Step 3: Train the cloud server Updating model parameters through calculation
[0123]
[0124] in, ;
[0125] Step 4: Training the cloud server Repeat steps 2 and 3 until the maximum number of iterations is reached. Or the parameters converge to the convergence threshold. And finally obtain the global model parameters. .
[0126] In summary, this invention ensures that in an online medical pre-diagnosis system oriented towards vertically partitioned data, the privacy of the data provider is not compromised under malicious inference attacks from internal adversaries, and the data provider's privacy is protected. With server ,server and The interaction between them is non-interactive, which reduces the computational and communication burden and has high practical value.
[0127] Before uploading medical data to the cloud server, the data provider performs two phases of tasks. The preprocessing phase involves using vector imputation and matrix calculations to obtain a local data matrix. Its purpose is to unify the dimensions of the diverse vertical medical data and prepare it for the non-interactive phase. The encryption phase utilizes… The distributed public key and random blinding factor are used to encrypt the local data matrix, with the aim of ensuring privacy while allowing the auxiliary cloud server to... Execute aggregation tasks. Unlike traditional dual-cloud server architectures that require frequent interactions between servers, the auxiliary cloud server... After obtaining the blinded ciphertext result, it is uploaded to the training cloud server. After that, there is no longer any communication between the two cloud servers, thus making better use of real-world deployments.
[0128] In existing privacy-preserving machine learning work, cloud servers mostly rely on encrypted model training, resulting in a significant computational burden. In this invention, Training a model in plaintext using a deblinded matrix has a lightweight computational burden. Furthermore, the deblinded matrix is an aggregation of local data matrices. User medical data cannot be stolen through inference attacks. Therefore, this invention provides strong security for user's private data.
[0129] The above description discloses only one preferred embodiment of the present invention, and should not be construed as limiting the scope of the present invention. Those skilled in the art will understand that all or part of the processes of the above embodiments can be implemented, and equivalent changes made in accordance with the claims of the present invention are still within the scope of the invention.
Claims
1. A data privacy protection method for a non-interactive online medical pre-diagnosis system, characterized in that, Includes the following steps: Building a system that includes trusted third-party institutions Data Provider Group Auxiliary cloud server and training cloud server Privacy transmission system; The third-party trusted institution Responsible for building the underlying cryptographic system, transmitting public keys to other entities, and providing them to auxiliary cloud servers. Provide private key services; The data provider group It includes several data providers, among which Each data provider possesses a set of partial data attributes and preprocesses these private attributes to generate a local data matrix. Using inner product functions and random masking techniques, the local data matrix is encrypted into matrix ciphertext and uploaded to an auxiliary cloud server. ; The auxiliary cloud server For privacy-preserving aggregation, through a trusted third-party institution. Sending function-derived key Decrypt the ciphertext of the local data matrix to obtain a blinded global data matrix; The training cloud server Through a trusted third-party organization Inverse of the sent blinding factor To decrypt the blinded global data matrix in order to recover the true result of the aggregated global data matrix; Gradient update formula transformation; Global data matrix generation; During the generation of the global data matrix, the data provider group The local medical data is processed in a unified manner to generate a local data matrix of the same dimension. Specifically, each vertical medical data sample is filled into a dimension vector with the feature plus 1, and then the local data matrix is calculated using the dimension vector, and finally a global data matrix is formed. Local data matrix encryption; The process of encrypting a local data matrix includes the following steps: System initialization; Data encryption; Data decryption; The task of system initialization is to generate the relevant parameters for the inner product function encryption system. The system initialization process includes the following steps: Step 1: By a trusted third-party organization Execute, and set security parameters As input, and generate mathematical groups and ,in Defined as the number of samples, a public key is generated from the master key. and private key ; Step 2: Trusted Third-Party Institution Randomly select a one-time blinding factor And calculate its inverse. ; Step 3: Trusted Third-Party Institution public key Assigned to One-time blinding factor Assigned to And Distributed to training cloud server During the training phase, a trusted third-party organization Will send to auxiliary cloud server Provide private key services, among which , They are different data providers; During the data encryption process, Through public key Encrypt local data. Through a one-time blinding factor Encrypt local data; The data decryption process includes the following steps: Step 1: Trusted Third-Party Institution According to Blinded vector sent Generate function derived key and the key Distributed to auxiliary cloud servers ; Step 2: Assisting the cloud server ciphertext Function-derived key As input, and with Return the discrete logarithm at base Then, auxiliary cloud server Will Upload to training cloud server ; Step 3: Upon receiving Then, training cloud server use calculate ; Step 4: Train the cloud server by performing the above operations on each element in the matrix. A matrix can be obtained: ; Furthermore, to simplify the notation, use Representation matrix elements in ,use Representation matrix elements in , , , matrix Represented as ; Global model parameters updated.
2. The data privacy protection method for the non-interactive online medical pre-diagnosis system as described in claim 1, characterized in that, The process of transforming the gradient update formula includes the following steps: Step 1: Use least squares approximation in the interval The above is about the logistic function Approximating, we obtain an approximate form of the loss function: , in, , , ; Step 2: Adjust the parameters respectively Taking the partial derivative of each component yields an approximate form of the gradient update formula: in, It is the current iteration number. ; Step 3: Using matrix calculations and summation formulas, transform the mathematical form of the gradient update formula to obtain a gradient update formula that decouples user data and model parameters:
3. The data privacy protection method for the non-interactive online medical pre-diagnosis system as described in claim 2, characterized in that, The process of updating global model parameters includes the following steps: Step 1: Training the cloud server Initialize learning rate Maximum number of iterations Convergence threshold and randomly initialized model parameters ; Step 2: Training the cloud server From the matrix Extract , in , ; Step 3: Train the cloud server Updating model parameters through calculation in, ; Step 4: Training the cloud server Repeat steps 2 and 3 until the maximum number of iterations is reached. Or the parameters converge to the convergence threshold. And finally obtain the global model parameters. .