A federated learning method, a blockchain system, and a storage medium

By combining global public-key encryption and the Paillier algorithm with threshold homomorphic encryption, along with blockchain technology, the security issues of federated learning under heterogeneous data are solved, enabling secure data sharing and model training across organizations, and ensuring data privacy and model accuracy.

CN116415693BActive Publication Date: 2026-06-12CHINA MOBILE COMM LTD RES INST +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA MOBILE COMM LTD RES INST
Filing Date
2021-12-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, blockchain-based federated learning computation for heterogeneous data is untrusted, especially when modeling across organizations, it lacks security measures to protect privacy, particularly when one organization has data attributes and another has data labels, making it impossible to effectively train models.

Method used

It employs global public-key encryption and global private-key decryption, records data update values ​​and tags through the blockchain, protects data privacy by combining threshold homomorphic encryption Paillier algorithm, and implements gradient calculation and model updates through smart contracts to ensure data security and non-repudiation.

🎯Benefits of technology

It enables secure data sharing and model training under heterogeneous data conditions, ensuring data security and reliability, while providing auditing functions and accountability mechanisms for participating parties.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116415693B_ABST
    Figure CN116415693B_ABST
Patent Text Reader

Abstract

The application discloses a kind of federal learning method and block chain system, storage medium, comprising: the first terminal of the first organization of the first data owner uses data attribute calculated update value, after using global public key encryption, record on block chain, the second terminal of the second organization of the second data owner uses global public key encryption after data label, record on block chain;Through smart contract, the first terminal and the second terminal use global private key to the encrypted update value and label are decrypted, and gradient is calculated according to the update value and label after decryption, wherein global private key is generated and distributed to data owner by trusted authority;The first terminal and the second terminal record the gradient calculated on block chain;According to gradient, model update is executed by smart contract.The application can guarantee data security, record cannot be modified and cannot be denied.If it is found that there is a malicious participant, it can directly start accountability to it.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of communication technology, and in particular to a federated learning method, a blockchain system, and a storage medium. Background Technology

[0002] The massive amounts of data from various industries and institutions are scattered across different companies and organizations. Different companies and organizations accumulate and apply different types of data. For example, telecom operators accumulate and use user communication data, e-commerce companies accumulate and use user consumption behavior and data, while banks accumulate and use financial and credit data.

[0003] However, from a business intelligence perspective, this intelligent revolution driven by AI (Artificial Intelligence) aims to create comprehensive user profiles. Cross-industry data collaboration presents significant challenges. Federated learning is a technological system that securely combines data from multiple parties for modeling. As a demonstrative application of blockchain platforms, multi-party collaborative modeling scenarios aim to leverage blockchain platforms combined with federated learning technology to solve the problems existing in multi-party collaborative modeling and achieve a distributed multi-party collaborative modeling platform that meets the data security requirements of all participating parties.

[0004] Federated learning is a machine learning framework that effectively helps multiple industries and organizations use data and perform machine learning modeling while meeting user privacy, data security, and government regulations. Federated learning can effectively solve the data silo problem, enabling participants to jointly model data without sharing existing data, thus achieving AI collaboration.

[0005] Blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Through distributed data storage, it effectively solves the control problem of centralized storage, enabling peer-to-peer information exchange and sharing without the need for third-party intervention.

[0006] The shortcoming of existing technologies lies in the fact that training data used for multi-party collaborative modeling is collected by multiple organizations from various users. Training data is categorized into homogeneous and heterogeneous data. Currently, most similar solutions are blockchain-based federated learning for homogeneous data, but lack privacy-preserving solutions for heterogeneous data; that is, blockchain-based federated learning computation for heterogeneous data is untrustworthy. Summary of the Invention

[0007] This invention provides a federated learning method, a blockchain system, and a storage medium to address the problem of untrusted computation in blockchain-based federated learning for heterogeneous data.

[0008] This invention provides the following technical solutions:

[0009] A federated learning method includes:

[0010] The first terminal of the first organization belonging to the data owner records the updated value calculated using the data attribute on the blockchain after being encrypted with a global public key. The second terminal of the second organization belonging to the data owner records the data tag on the blockchain after being encrypted with a global public key. The global public key is generated and allocated to the data owner by a trusted authority.

[0011] The first and second terminals of the smart contract use a global private key to decrypt the encrypted update value and tag, and calculate the gradient based on the decrypted update value and tag. The global private key is generated and distributed to the data owner by a trusted authority.

[0012] The first and second terminals record the calculated gradients on the blockchain.

[0013] The model is updated based on gradients via smart contracts.

[0014] In practice, the data tags contain privacy-sensitive data.

[0015] During implementation, it further includes:

[0016] The data tags are encrypted using Paillier threshold homomorphic encryption.

[0017] In implementation, data tags are encrypted using a global public key and recorded on the blockchain, including:

[0018] The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

[0019] In practice, the data tags recorded on the blockchain include one or a combination of the following information:

[0020] The index of the tag data, the encrypted value of the tag data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

[0021] In implementation, the updated values ​​calculated using data attributes are recorded on the blockchain after being encrypted using a global public key, including:

[0022] The first organization updates the value [a] via a smart contract. l ],[z l ] and [ReLu(z lThe hash value and related data of the data are written to the blockchain. Predefined information is added through smart contracts. After the blockchain consensus is reached, the data is written to the blockchain's state database. The corresponding data query function is implemented through smart contracts for subsequent auditing.

[0023] In practice, the data attributes recorded on the blockchain include one or a combination of the following:

[0024] The identifier of the data attribute, the current iteration number and the current layer number, and the current [a] l The value of ], data [a l The hashed value, currently [z] l The value of ], data [z l The value after hashing is the current [ReLu(z)]. l The value of ReLu(z)], this data. l The hashed value is the timestamp of the current data upload and the identity identifier of the current data uploader.

[0025] In practice, the gradients recorded on the blockchain are uploaded in the form of private data.

[0026] During implementation, it further includes:

[0027] The accuracy of the updated model is verified through smart contracts.

[0028] During implementation, it further includes:

[0029] If the model's accuracy is reduced, then validation is performed.

[0030] During implementation, verification includes:

[0031] The gradient is calculated after obtaining unencrypted data attributes from the first terminal and unencrypted data tags from the second terminal.

[0032] A federated learning blockchain system includes: a trusted authority, a first terminal belonging to a first organization of data owners, a second terminal belonging to a second organization of data owners, and a blockchain with the first terminal and the second terminal as nodes, wherein:

[0033] A trusted authority is used to generate global public and private keys and distribute them to data owners;

[0034] The first terminal is used to calculate the updated value using data attributes. After being encrypted with a global public key, it is recorded on the blockchain. The first terminal and the second terminal use a global private key to decrypt the encrypted updated value and tag through a smart contract. The gradient is calculated based on the decrypted updated value and tag, and the calculated gradient is recorded on the blockchain.

[0035] The model is updated based on gradient execution via smart contracts;

[0036] The second terminal is used to record the data tags on the blockchain after they are encrypted with a global public key. The first and second terminals use a global private key to decrypt the encrypted update value and tag through a smart contract. The gradient is calculated based on the decrypted update value and tag, and the calculated gradient is recorded on the blockchain.

[0037] The model is updated based on gradients via smart contracts.

[0038] In practice, the data tags further processed by the second terminal contain privacy-related data.

[0039] In practice, the second terminal is further used to encrypt the data tag using Paillier threshold homomorphic encryption.

[0040] In implementation, the second terminal is further used when recording data tags on the blockchain after encryption with a global public key, including:

[0041] The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

[0042] In implementation, the second terminal further uses data tags recorded on the blockchain that include one or a combination of the following information:

[0043] The index of the tag data, the encrypted value of the tag data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

[0044] In implementation, the first terminal further uses the updated value calculated using data attributes, and records it on the blockchain after encryption using a global public key, including:

[0045] The first organization updates the value [a] via a smart contract. l ],[z l ] and [ReLu(z l The hash value and related data of the data are written to the blockchain. Predefined information is added through smart contracts. After the blockchain consensus is reached, the data is written to the blockchain's state database. The corresponding data query function is implemented through smart contracts for subsequent auditing.

[0046] In implementation, the first terminal further records data attributes on the blockchain including one or a combination of the following:

[0047] The identifier of the data attribute, the current iteration number and the current layer number, and the current [a] l The value of ], data [a l The hashed value, currently [z] l The value of ], data [z l The value after hashing is the current [ReLu(z)]. l The value of ReLu(z)], this data. l The hashed value is the timestamp of the current data upload and the identity identifier of the current data uploader.

[0048] In practice, the gradients recorded on the blockchain by the second terminal and the first terminal are recorded in the form of privacy data.

[0049] In practice, the second terminal and the first terminal are further used to verify the accuracy of the updated model through smart contracts.

[0050] During implementation, it further includes:

[0051] The block manager is used to perform validation if the model's accuracy is reduced.

[0052] In implementation, the block manager further uses it during verification, including:

[0053] The gradient is calculated after obtaining unencrypted data attributes from the first terminal and unencrypted data tags from the second terminal.

[0054] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described federated learning method.

[0055] The beneficial effects of this invention are as follows:

[0056] In the technical solution provided by this invention, for cases where there are incomplete data owners—that is, one or more data owners possess data attributes but lack labels—a label data protection and model data sharing technical solution is proposed based on blockchain and privacy-preserving federated learning. During collaborative modeling, blockchain technology enables data sharing; however, the data from the first and second organizations is not directly transmitted externally but is distributed as ciphertext after encryption. Therefore, data security is guaranteed by the encryption algorithm used. Since the algorithm is semantically safe, data security is ensured.

[0057] Furthermore, since the data of the first and second organizations is recorded on the blockchain, this record cannot be modified or denied.

[0058] Furthermore, if malicious actors are found after the accuracy of the smart contract execution model is verified, they can be directly held accountable. Attached Figure Description

[0059] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this invention, illustrate exemplary embodiments of the invention and are used to explain the invention, but do not constitute an undue limitation of the invention. In the drawings:

[0060] Figure 1 This is a schematic diagram illustrating the implementation process of the federated learning method in an embodiment of the present invention;

[0061] Figure 2 This is a schematic diagram of a scenario for federated learning collaborative modeling in an embodiment of the present invention;

[0062] Figure 3 This is a schematic diagram of the structure of a federated learning blockchain system in an embodiment of the present invention. Detailed Implementation

[0063] The inventor noticed the following during the invention process:

[0064] Blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Through distributed data storage, it effectively solves the dominance problem of centralized storage, enabling peer-to-peer information exchange and sharing without the need for third-party intervention. However, since blockchain itself does not support complex computations, how to complete complex computations such as multi-party collaborative learning and federated learning, while ensuring computational reliability, model availability, and data privacy during model training, is a problem that the technical solution provided in this invention aims to solve, based on blockchain and federated learning.

[0065] Training data for multi-party collaborative modeling is collected by multiple organizations from various users. Training data is categorized as homogeneous and heterogeneous. Data collected by different organizations with overlapping users but non-overlapping data features is called heterogeneous data; data with non-overlapping users but overlapping data features is called homogeneous data. Currently, most similar solutions are blockchain-based federated learning for homogeneous data. However, some privacy protection measures are lacking in the following application scenarios:

[0066] Existing federated learning for heterogeneous data addresses the problem of multiple organizations, as data owners, wanting to train a model on their shared data. Each organization's data contains one or more attributes and labels. Their data attributes do not overlap.

[0067] However, current federated learning for heterogeneous data fails to address or consider the situation where organizations lack labels. Organization A possesses data attributes, while organization B possesses both data and labels. Currently, there is a lack of security measures to combine the data attributes of organization A with the data labels of organization B for model training. To further illustrate this scenario, consider this example: telecom operators possess user communication data and labels, while e-commerce companies possess user consumption data and labels. When performing cross-organizational collaborative modeling, for example, when modeling product recommendation services for users, telecom operators lack user consumption behavior labels, and when recommending phone plans to users, e-commerce companies lack user phone bill labels. Therefore, it is necessary to share user behavior labels based on the application scenario to complete collaborative modeling.

[0068] Existing blockchain-based federated learning for homogeneous data struggles to efficiently protect the privacy of intermediate gradients. This paper presents a solution for federated learning on heterogeneous data, addressing the issue that organization A possesses data attributes, while organization B possesses labels. The model is trained by combining the data attributes of organization A and the data labels of organization B. Following an MLP training algorithm, organizations A and B collaborate through a federated learning computational model, iteratively updating the model over multiple rounds.

[0069] The MLP (Multi-Layer Perceptron) training algorithm used in this implementation is introduced below: The model trained in this implementation is an L-layer MLP, using the ReLU activation function and the MSE (mean-square error) loss function.

[0070] z l =W l a l-1 for 1≤l≤L

[0071]

[0072] To address the above situation, the basic steps of the t-th round of federated learning model iteration are as follows, where W t For this wheel model:

[0073] Step 1: Organization A calculates a locally. l , z l and ReLu(z) l ), and send the results to the server;

[0074] Step 2: Organization B will label it. The data is sent to the server, which then calculates the gradient F. k (w):

[0075]

[0076]

[0077] Step 3: Server performs model update:

[0078] W t+1 =W t -ηF k (W)

[0079] N is the total number of instances. k η is the number of instances of the k-th data owner, and η is the learning rate. The server sends the updated global gradient to the data owners for the next iteration.

[0080] The specific embodiments of the present invention will now be described with reference to the accompanying drawings.

[0081] Figure 1 The flowchart illustrating the implementation process of the federated learning method is shown in the figure, including:

[0082] Step 101: The first terminal of the first organization belonging to the data owner calculates the updated value using the data attribute, and records it on the blockchain after encrypting it with the global public key. The second terminal of the second organization belonging to the data owner records the data tag on the blockchain after encrypting it with the global public key. The global public key is generated and allocated to the data owner by TA.

[0083] Step 102: Decrypt the encrypted update value and tag using the global private key through the first and second terminals of the smart contract, and calculate the gradient based on the decrypted update value and tag. The global private key is generated by TA and assigned to the data owner.

[0084] Step 103: The first terminal and the second terminal record the calculated gradient on the blockchain;

[0085] Step 104: Update the model according to the gradient through a smart contract.

[0086] Figure 2 The diagram illustrates a scenario for collaborative modeling in federated learning. The scheme includes three roles: a trusted authority (TA), a blockchain provider, and data owners. Organization A (the first organization) and Organization B (the second organization) are the data owners. In this scheme, participants employ a federated learning computational model to train a Multilayer Perceptron (MLP) model.

[0087] Trusted Authority (TA): The TA's job is to generate and distribute global public-private key pairs. After generating and distributing the global public-private key pairs, the TA can go offline.

[0088] Blockchain reliably provides the necessary data for the flow of solutions, such as key initialization and model noise; it records intermediate results and monitors the behavior of each participant to hold dishonest participants accountable. Furthermore, blockchain acts as a model aggregator, similar to the role of aggregating intermediate gradients from data owners in federated learning, automatically aggregating encrypted gradients and updating the global model through smart contracts.

[0089] Data owners: Data owners agree that aggregators receive optimized models trained on their joint datasets. They compute local gradients locally, share local gradients, and optimize the models. In this scheme, the data owners include Organization A and Organization B. In this scheme, Organization A provides training data records, and Organization B provides data labels, thus enabling federated learning of heterogeneous data.

[0090] This solution is a specific approach to implementing blockchain-based federated learning trusted computing for heterogeneous data.

[0091] In practice, the data tags contain privacy-sensitive data.

[0092] In practice, the data tag may be further encrypted using Paillier threshold homomorphic encryption.

[0093] Specifically, due to the labels owned by organization B Privacy is involved. For privacy reasons, Organization B cannot directly [transfer / distribute] [the data / information]. The data is transmitted to Organization A. Privacy will be protected using Paillier threshold homomorphic encryption. On the other hand, Organizations A and B distrust each other, each believing the other might provide incorrect calculation results. In implementation, blockchain technology can be used to increase trust among the participants.

[0094] Paillier, a threshold homomorphic encryption algorithm, builds upon the original Paillier by dividing the private key into t (threshold values) parts through a Shamir secret sharing mechanism. Paillier is a well-known algorithm in applied cryptography, and its characteristics are briefly described below: decryption requires the participation of at least t participants. During decryption, each participant possessing a portion of the key calculates a sub-plaintext based on the ciphertext and the subkey. The t sub-plaintexts are then combined to obtain the decrypted result.

[0095] The following steps will be explained in detail. It should be noted that the steps are only for better understanding and do not imply a strict temporal relationship between the steps.

[0096] The calculation process for participants during implementation is as follows:

[0097] Step 0:

[0098] TA generates keys and assigns each data owner a global public key (PK) and a global private key (SK). i A verification key (VK) i To illustrate, with two data owners as an example, the threshold value is set to 2. Decryption can only be successful when both data owners, namely organization A and organization B, participate simultaneously.

[0099] Organization B encrypts the tag using a global public key (PK). Using the following "tag-on-chain" method, Organization B will encrypt the tags. Recorded on the blockchain. Because the Paillier encryption algorithm is itself a commitment system, when organization B sends the encrypted tag... Once recorded on the blockchain, Organization B cannot tamper with the data midway, and others cannot obtain Organization B's privacy from the encrypted data on the chain.

[0100] Tag on-chain:

[0101] In implementation, data tags are encrypted using a global public key and recorded on the blockchain, including:

[0102] The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

[0103] In practice, the data tags recorded on the blockchain include one or a combination of the following information:

[0104] The index of the tag data, the encrypted value of the tag data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

[0105] Specifically, Organization B will label... The record is stored on the blockchain. This action ensures auditing capabilities for Organization B.

[0106] Organization B uses smart contracts to transfer tags When other data is written to the blockchain, the smart contract automatically adds information such as the data uploader's identity and the data upload timestamp. After the blockchain reaches a consensus, this data will be written into the blockchain's state database, and the smart contract will be used to implement the corresponding data query function for subsequent auditing.

[0107] The data format written to the blockchain is:

[0108]

[0109] Where, the value of Id (identifier) ​​is idx, which is the index of the tag data, and the value of Label (label) is... This refers to the encrypted value of the tag data. The timestamp value of AddTime is automatically added by the smart contract, representing the current data upload timestamp. The userId value of AddUser is also automatically added by the smart contract, representing the identity of the data uploader. All data uploaded to the blockchain by the tag is shared blockchain data.

[0110] Step 1:

[0111] Organization A reads the updated global model from the blockchain.

[0112] Step 2:

[0113] Organization A calculates a locally using its data attributes. l , z l and ReLu(z) l ).

[0114] Organization A encrypts a using a global public key (PK). l , z l and ReLu(z) l ), to obtain the ciphertext [a l ],[z l ] and [ReLu(z l Using the following "gradient uplinking" method, [a] l ],[z l ] and [ReLu(z l The hash values ​​of these data are recorded on the blockchain.

[0115] Because the Paillier encryption algorithm is a commitment system, once organization A records the encrypted gradient and proof on the blockchain, organization A cannot tamper with the data midway, and others cannot obtain organization A's data privacy from the encrypted data on the chain.

[0116] Gradient uplink:

[0117] In implementation, the updated values ​​calculated using data attributes are recorded on the blockchain after being encrypted using a global public key, including:

[0118] The first organization updates the value [a] via a smart contract. l ],[z l ] and [ReLu(z lThe hash value and related data of the data are written to the blockchain. Predefined information is added through smart contracts. After the blockchain consensus is reached, the data is written to the blockchain's state database. The corresponding data query function is implemented through smart contracts for subsequent auditing.

[0119] In practice, the data attributes recorded on the blockchain include one or a combination of the following information:

[0120] The identifier of the data attribute, the current iteration number and the current layer number, and the current [a] l The value of ], data [a l The hashed value, currently [z] l The value of ], data [z l The value after hashing is the current [ReLu(z)]. l The value of ReLu(z)], this data. l The hashed value is the timestamp of the current data upload and the identity identifier of the current data uploader.

[0121] Specifically, organization A will [a l ],[z l ] and [ReLu(z l [The data] and their hash values ​​are recorded on the blockchain. This behavior ensures auditing capabilities for organization A.

[0122] Organization A will use a smart contract to [a] l ],[z l ] and [ReLu(z l The hash value of the data, along with other data, is written to the blockchain. The smart contract also automatically adds information such as the identity of the data uploader and the data upload timestamp. After blockchain consensus, this data will be written to the blockchain's state database, and the corresponding data query function will be implemented through the smart contract for subsequent auditing.

[0123] The data format written to the blockchain is:

[0124]

[0125] Wherein, the value idx of Id is the identifier of this tag data. When constructing idx, the current iteration number and the current layer number l must be considered simultaneously. The value of ValueA is the current [a l The value of HashA is Hash([a]). l [a] is the data entry. l The hashed value, ValueZ, is the current [z] value. l The value of ], the value of HashZ Hash([zl [z] is the data entry. l The value after hashing, ValueReLu, is the current [ReLu(z)]. l The value of HashReLu is Hash([ReLu(z)], which is the value of HashReLu. l )]) represents the ReLU(z) data. l The hashed values, AddTime and AddUser, have the same meaning as above. Since gradient data is encrypted, all gradient data uploaded to the blockchain is shared by the entire blockchain.

[0126] Step 3:

[0127] [a l ],[z l ], And [ReLU], smart contracts utilize Paillier's property to compute the gradient [F] of the ciphertext. k (W)].

[0128] The smart contract sends gradient decryption requests to organizations A and B to decrypt [F]. k (W)].

[0129] Organization A and Organization B calculate the gradient subtext using a portion of their local private keys and upload it to the blockchain using the "subtext on-chain" method described below.

[0130] After the smart contract executes the verification and aggregates the two sub-original texts, it obtains the gradient F of the plaintext. k (W). Model updates are performed via smart contracts.

[0131] W t+1 =W t -ηF k (W)

[0132] Original text uploaded to the chain:

[0133] In practice, the gradients recorded on the blockchain are uploaded in the form of private data.

[0134] Organizations A and B will upload the sub-original text calculated by their respective organizations to the blockchain in the form of privacy data.

[0135] The data format written into the privacy data is as follows:

[0136]

[0137] Where the value of Id, idx, is the identifier of the data, and the value of Value [F k [(W)] is the subtext of the gradient data calculated by organization A or B.

[0138] Step 4: Verify the accuracy of the updated model through a smart contract.

[0139] Specifically, the smart contract verifies the accuracy of the updated model. If the model accuracy is not lower than the accuracy of the previous model, return to step 1 to proceed to the next iteration.

[0140] If the model's accuracy decreases during implementation, it will be validated.

[0141] In practice, verification includes:

[0142] The gradient is calculated after obtaining unencrypted data attributes from the first terminal and unencrypted data tags from the second terminal.

[0143] Specifically, if the model accuracy is reduced, a computational verification step is initiated. Specifically, organizations A and B are required to publicly disclose the data recorded on the blockchain. This is based on organization B's plaintext tag data and organization A's plaintext computation result a. l , z l and ReLu(z) l The smart contract performs a calculation and checks whether organization A's calculation result in the second step is correct. If it is incorrect, organization A is held accountable. If it is correct, organization B is held accountable.

[0144] Based on the same inventive concept, this invention also provides a federated learning blockchain system and a computer-readable storage medium. Since the principles by which these devices solve problems are similar to those of federated learning, the implementation of these devices can refer to the implementation of the method, and repeated details will not be repeated.

[0145] When implementing the technical solutions provided in the embodiments of the present invention, they can be implemented in the following manner.

[0146] Figure 3 The diagram shows the architecture of a federated learning blockchain system. The system includes:

[0147] A federated learning blockchain system includes: a trusted authority, a first terminal belonging to a first organization of data owners, a second terminal belonging to a second organization of data owners, and a blockchain with the first terminal and the second terminal as nodes, wherein:

[0148] A trusted authority is used to generate global public and private keys and distribute them to data owners;

[0149] The first terminal is used to calculate the updated value using data attributes. After being encrypted with a global public key, it is recorded on the blockchain. The first terminal and the second terminal use a global private key to decrypt the encrypted updated value and tag through a smart contract. The gradient is calculated based on the decrypted updated value and tag, and the calculated gradient is recorded on the blockchain.

[0150] The model is updated based on gradient execution via smart contracts;

[0151] The second terminal is used to record the data tags on the blockchain after they are encrypted with a global public key. The first and second terminals use a global private key to decrypt the encrypted update value and tag through a smart contract. The gradient is calculated based on the decrypted update value and tag, and the calculated gradient is recorded on the blockchain.

[0152] The model is updated based on gradients via smart contracts.

[0153] In practice, the data tags further processed by the second terminal contain privacy-related data.

[0154] In practice, the second terminal is further used to encrypt the data tag using Paillier threshold homomorphic encryption.

[0155] In implementation, the second terminal is further used when recording data tags on the blockchain after encryption with a global public key, including:

[0156] The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

[0157] In implementation, the second terminal further uses data tags recorded on the blockchain that include one or a combination of the following information:

[0158] The index of the tag data, the encrypted value of the tag data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

[0159] In implementation, the first terminal further uses the updated value calculated using data attributes, and records it on the blockchain after encryption using a global public key, including:

[0160] The first organization updates the value [a] via a smart contract. l ],[z l ] and [ReLu(z lThe hash value and related data of the data are written to the blockchain. Predefined information is added through smart contracts. After the blockchain consensus is reached, the data is written to the blockchain's state database. The corresponding data query function is implemented through smart contracts for subsequent auditing.

[0161] In implementation, the first terminal further records data attributes on the blockchain including one or a combination of the following:

[0162] The identifier of the data attribute, the current iteration number and the current layer number, and the current [a] l The value of ], data [a l The hashed value, currently [z] l The value of ], data [z l The value after hashing is the current [ReLu(z)]. l The value of ReLu(z)], this data. l The hashed value is the timestamp of the current data upload and the identity identifier of the current data uploader.

[0163] In practice, the gradients recorded on the blockchain by the second terminal and the first terminal are recorded in the form of privacy data.

[0164] In practice, the second terminal and the first terminal are further used to verify the accuracy of the updated model through smart contracts.

[0165] During implementation, it further includes:

[0166] The block manager is used to perform validation if the model's accuracy is reduced.

[0167] In implementation, the block manager further uses it during verification, including:

[0168] The gradient is calculated after obtaining unencrypted data attributes from the first terminal and unencrypted data tags from the second terminal.

[0169] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described federated learning method.

[0170] For details on implementation, please refer to the implementation of the federated learning method.

[0171] In summary, the technical solutions provided in the embodiments of this invention propose a blockchain-based, privacy-preserving federated learning approach for situations where there are incomplete data owners—that is, one or more data owners possess data attributes but lack labels. This approach proposes technical solutions for labeled data protection and model data sharing. It has at least the following technical effects:

[0172] (1) Security.

[0173] During collaborative modeling, data from Organization A and Organization B is not directly transmitted externally; instead, it is encrypted before being distributed as ciphertext. Therefore, data security in this solution is guaranteed by the Paillier encryption algorithm used. Since Paillier is semantically safe, data security is ensured in this solution.

[0174] (2) Auditing of data owners.

[0175] Since the data of Organization A and Organization B are recorded on the blockchain, these records cannot be modified or denied. Furthermore, after the accuracy of the smart contract execution model is verified, if any malicious participants are found, they can be directly held accountable.

[0176] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0177] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0178] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0179] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0180] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A federated learning method, characterized in that, include: The first terminal of the first organization belonging to the data owner records the updated value calculated using the data attribute on the blockchain after being encrypted with a global public key. The second terminal of the second organization belonging to the data owner records the data tag on the blockchain after being encrypted with a global public key. The global public key is generated and allocated to the data owner by a trusted authority. The first and second terminals of the smart contract use a global private key to decrypt the encrypted update value and tag, and calculate the gradient based on the decrypted update value and tag. The global private key is generated and distributed to the data owner by a trusted authority. The first and second terminals record the calculated gradients on the blockchain. The model is updated based on gradient execution via smart contracts; Data tags are encrypted using a global public key and recorded on the blockchain, including: The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

2. The method as described in claim 1, characterized in that, The data tags contain privacy-protected data.

3. The method as described in claim 2, characterized in that, Further includes: The data tags are encrypted using Paillier threshold homomorphic encryption.

4. The method as described in claim 1, characterized in that, Data tags recorded on the blockchain include one or a combination of the following information: The index of the tag data, the encrypted value of the tag data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

5. The method as described in claim 1, characterized in that, The updated value calculated using data attributes is recorded on the blockchain after being encrypted using a global public key, including: The first organization uses smart contracts to write updated values, hash values, and related data onto the blockchain. It adds predefined information through smart contracts, and after blockchain consensus, writes the data into the blockchain's state database. It also uses smart contracts to implement corresponding data query functions for subsequent auditing.

6. The method as described in claim 5, characterized in that, Data attributes recorded on the blockchain include one or a combination of the following: The data attribute identifier, the current iteration number and the current layer number, the current value, the value after hashing the data, the current value, the value after hashing the data, the current value, the value after hashing the data, the timestamp of the current data upload, and the identity identifier of the current data uploader.

7. The method as described in claim 1, characterized in that, The gradients recorded on the blockchain are uploaded in the form of private data.

8. The method according to any one of claims 1 to 7, characterized in that, Further includes: The accuracy of the updated model is verified through smart contracts.

9. The method as described in claim 8, characterized in that, Further includes: If the model's accuracy is reduced, then validation is performed.

10. The method as described in claim 9, characterized in that, Verification includes: The gradient is calculated after obtaining unencrypted data attributes from the first terminal and unencrypted data tags from the second terminal.

11. A federated learning blockchain system, characterized in that, include: A trusted and authoritative institution, the first terminal of the first organization belonging to the data owner, the second terminal of the second organization belonging to the data owner, and a blockchain with the first terminal and the second terminal as nodes, wherein: A trusted authority is used to generate global public and private keys and distribute them to data owners; The first terminal is used to calculate the updated value using data attributes. After being encrypted with a global public key, it is recorded on the blockchain. The first terminal and the second terminal use a global private key to decrypt the encrypted updated value and tag through a smart contract. The gradient is calculated based on the decrypted updated value and tag, and the calculated gradient is recorded on the blockchain. The model is updated based on gradient execution via smart contracts; The second terminal is used to record the data tags on the blockchain after they are encrypted with a global public key. The first and second terminals use a global private key to decrypt the encrypted update value and tag through a smart contract. The gradient is calculated based on the decrypted update value and tag, and the calculated gradient is recorded on the blockchain. The model is updated based on gradient execution via smart contracts; Data tags are encrypted using a global public key and recorded on the blockchain, including: The second organization uses smart contracts to write data tags and other relevant data onto the blockchain; it adds predefined information through smart contracts, and after consensus is reached on the blockchain, the data is written into the blockchain's state database, and the corresponding data query function is implemented through smart contracts for subsequent auditing.

12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method of any one of claims 1 to 10.