A federated learning hierarchical incentive method based on personalized privacy protection
By designing a personalized incentive strategy using a generative diffusion model (GDM), the privacy protection requirements of LMO and Workers in federated learning were addressed. The incentive strategy was optimized, the training efficiency of the global model was improved, a balance between privacy protection and model quality was achieved, and the information asymmetry problem was solved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 湖南工商大学
- Filing Date
- 2025-07-03
- Publication Date
- 2026-06-19
AI Technical Summary
In the hierarchical architecture of federated learning, existing incentive schemes fail to effectively consider the personalized privacy protection needs of LMO and Workers, resulting in an imbalance between the level of privacy protection and model performance. Information asymmetry leads to inefficient incentive strategy design, and the lack of systematic modeling and optimization of hierarchical privacy budgets affects the convergence efficiency and generalization ability of the global model.
A generative diffusion model (GDM) is used to design personalized incentive strategies. The GDM is used to optimize the multi-dimensional incentive strategy elements and generate personalized incentive strategies for different LMO characteristics. Combining privacy preferences and data contribution capabilities, a worker selection scheme is constructed, and hierarchical privacy budget constraints are set to optimize LMO and worker data contribution behavior.
This approach achieves a balance between privacy protection, enhanced LMO and worker participation, optimized global model training efficiency, ensured the effectiveness of incentive strategies and the quality of the global model, resolved the information asymmetry problem, and balanced privacy protection with model quality.
Smart Images

Figure CN120806066B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of federated learning technology, and in particular relates to a hierarchical incentive method for federated learning based on personalized privacy protection. Background Technology
[0002] The innovation of mobile computing technology and the large-scale deployment of IoT devices have driven the exponential growth of sensed data, providing a rich multimodal data foundation for machine learning model training. However, the traditional cloud computing paradigm relies on terminal devices uploading raw data to a central server, facing two challenges: first, massive data transmission overloads network bandwidth resources, making it difficult to meet the low-latency requirements of real-time sensitive scenarios; second, cross-domain data flow exacerbates the risk of privacy leaks, making it difficult to comply with increasingly stringent data sovereignty regulations (such as GDPR). Against this backdrop, Federated Learning (FL) achieves a paradigm breakthrough by reconstructing the distributed collaboration mechanism: the Local Model Owner (LMO) trains model parameters based on the local dataset, and the Task Publisher (TP) uses a Federated Averaging (FedAvg) algorithm to aggregate distributed parameter updates and iteratively generate a global model. Compared to traditional cloud computing, FL, through its "data localization processing - lightweight model transmission" mechanism, avoids cross-domain transmission of raw data while reducing communication overhead by 1-2 orders of magnitude, significantly improving privacy protection and resource utilization efficiency.
[0003] While federated learning has demonstrated broad application potential in fields such as healthcare (e.g., NVIDIA Claran FL cross-hospital collaborative diagnosis) and connected vehicles (drone collaborative perception), its practical deployment still faces a core bottleneck: existing research largely assumes that LMOs have sufficient local data to support high-quality model training. However, in reality, limited by device sensing capabilities, storage capacity, or user participation, some LMOs may face challenges such as insufficient data scale or incomplete category coverage, leading to deterioration in global model convergence efficiency and generalization ability. Therefore, recruiting workers to collect additional relevant data provides an effective way to solve this problem. A three-layer collaborative architecture of "TP-LMOs-Workers" can be constructed: TP acts as the global coordinator, issuing tasks and designing incentive strategies; LMOs, as the middle layer, participate in federated model training and recruit workers to collect key data to optimize local models; and Workers, as the bottom-level executors, collect data through sensing devices and submit it to the corresponding LMO. This extension can alleviate the problem of insufficient data, but how TPs design effective incentive strategies to drive LMOs to participate in FL and provide high-quality local models, and how LMOs incentivize Workers to collect and upload high-value data, remain core challenges in FL layered incentive research.
[0004] Currently, contract theory, auction mechanisms, and game theory are the three most commonly used methods in federated learning incentive schemes. Contract theory guides participants to act in the expected manner by designing contracts that satisfy individual rationality and incentive compatibility; auction mechanisms achieve optimal resource allocation and automatic price discovery by introducing a bidding process; and game theory reflects the game equilibrium behavior in the incentive process by modeling the strategic interactions among multiple participants. These methods are mostly limited to single-layer FL architectures and have not yet deeply considered the need for LMOs to incentivize Workers to collect the required data. Furthermore, the following key issues in existing hierarchical incentive mechanisms for federated learning still require further research and resolution.
[0005] (1) Both LMO and Worker lack personalized privacy incentives.
[0006] In the Federated Learning (FL) framework, the core of the incentive scheme lies in driving participants at each layer to actively participate and contribute high-quality data or models to improve the convergence and generalization ability of the global model. However, existing incentive scheme research mainly focuses on designing incentive strategies to compensate for data volume and computing resources, without fully considering the privacy protection needs of LMOs when providing local model parameters. Although FL mitigates the risk of privacy leakage to some extent by avoiding the sharing of raw data, local model parameters may still be vulnerable to inference attacks, thereby exposing data distribution or sensitive information. To address this, some studies have introduced Differential Privacy (DP) mechanisms, adding noise to local model parameters to enhance privacy protection. However, most existing incentive schemes assume that all LMOs share the same privacy budget, ignoring the heterogeneity of privacy preferences among LMOs in real-world scenarios. This fixed privacy budget strategy may lead to an incentive imbalance between the level of privacy protection and model performance, thereby reducing the incentive for LMOs to participate in FL. Under the TP-LMOs-Workers hierarchical incentive framework, the incentive problem for privacy protection becomes increasingly complex. While the addition of Workers alleviates the challenge of insufficient data, their data contribution behavior is significantly affected by privacy risks. Workers may refuse to provide data for fear of data leakage. Moreover, since different Workers have different levels of privacy sensitivity, if the incentive scheme cannot reasonably compensate for the corresponding privacy losses, Workers may reduce data quality (e.g., by adding excessive noise) or directly withdraw from the FL task, thereby affecting the local model training of LMO and further weakening the global model performance. Therefore, how to effectively incentivize LMOs to participate in FL, and further construct an incentive scheme that takes into account the privacy preferences of different Workers, so that TP can provide differentiated incentives based on different LMOs and LMOs can provide differentiated incentives based on the privacy needs of different Workers, in order to optimize the balance between privacy protection and model quality, is an important research direction for improving the incentive effectiveness of the FL system.
[0007] (2) Lack of consideration for hierarchical privacy budget constraints
[0008] In federated learning hierarchical incentive schemes, effective incentives from the TP (Transaction Manager) for LMOs (Local Learning Modules) are a prerequisite for subsequent LMO incentives for Workers to contribute data. The data provided by Workers affects the training of local models, ultimately impacting the global model. Since both Worker data and pre-trained model parameters require privacy perturbations, this significantly reduces the overall model quality. However, in existing hierarchical architectures, the TP typically only focuses on privacy protection of model parameters uploaded by LMOs, without directly intervening in the privacy budget settings of Workers during the LMO data procurement process. Nevertheless, the TP-LMOs-Workers three-layer architecture exhibits a certain degree of coupling in privacy budget control; that is, the total TP budget determines the upper limit of the privacy budget for each LMO under a given model quality, while LMOs need to further regulate the data privacy budget of Workers under the constraints of the TP privacy budget to ensure the synergistic optimization of the overall privacy protection strategy and model quality. However, existing research generally lacks systematic modeling and optimization analysis of hierarchical privacy budget constraints, resulting in a significant deficiency in the synergistic optimization between privacy protection and model performance.
[0009] (3) Incentive challenges under information asymmetry
[0010] In a federated learning architecture at the TP-LMO level, significant differences exist among LMOs in terms of data scale and privacy preferences, making it difficult for the TP to comprehensively and accurately assess their true contributions. Particularly regarding privacy protection, some LMOs, concerned about privacy breaches, may refuse to disclose the true data scale or quality, or even exaggerate their data contributions to obtain more incentives. This information asymmetry interferes with the TP's reasonable assessment of participant utility, affecting the effectiveness of resource allocation and contribution pricing, and ultimately potentially leading to a decline in model aggregation performance, failing to meet the system's requirements for global model accuracy and stability. Contract theory, as a classic approach to addressing information asymmetry, is widely used in incentive scheme design. Traditional research often utilizes parameter configuration methods with incentive-compatible constraints to guide participants to truthfully report their private information, thereby optimizing resource allocation and improving system efficiency. However, existing methods are mostly based on one-dimensional or low-dimensional utility functions, making it difficult to address the challenges posed by multi-dimensional heterogeneous factors in federated learning. On the one hand, the utility function of LMO often exhibits highly nonlinear and coupled characteristics under multidimensional conditions, increasing the difficulty of theoretical derivation and solution of the excitation strategy parameter design. On the other hand, traditional methods usually rely on preset distributions or functional forms, which are difficult to adapt to dynamic and complex system environments, and are prone to getting trapped in local optima, thus limiting the effectiveness and global optimality of the excitation strategy. Summary of the Invention
[0011] To address the aforementioned technical issues, this invention proposes a layered incentive method for federated learning based on personalized privacy protection. Personalized incentive and selection methods are constructed at two levels: TP-LMOs and LMO-Workers, respectively, to optimize the balance between data quality, privacy protection, and global model training performance.
[0012] To achieve the above objectives, this invention provides a layered incentive method for federated learning based on personalized privacy protection, comprising:
[0013] The task publisher TP uses a generative diffusion model to jointly analyze the privacy preferences and data purchasing power of the local model owner LMO;
[0014] Generate a set of triplet incentive policies that include the effective amount of data, privacy budget, and reward;
[0015] Each LMO selects a corresponding incentive strategy based on its own type, and filters worker combinations that meet the effective data volume requirements based on incentive strategy constraints.
[0016] The Worker adds noise to the raw data according to the declared privacy budget and then submits the corresponding LMO;
[0017] LMO aggregates noisy data to train a local model, perturbs the model parameters according to the privacy budget constraint in the selected incentive strategy, and then uploads it to TP.
[0018] Optionally, the training process of the generative diffusion model includes:
[0019] During the forward diffusion phase, Gaussian noise is gradually added to the incentive strategy vector;
[0020] The optimal incentive strategy distribution is recovered through a neural network during the back-diffusion phase.
[0021] The environment state input includes the number of LMO participants, the number of categories, the minimum privacy budget and the total incentive budget in the incentive strategy.
[0022] Optionally, the Worker screening process includes:
[0023] Calculate the cost per unit of effective data:
[0024] according to Select the Worker group by sorting values in ascending order;
[0025] The amount of valid data
[0026] Optionally, the generation conditions for the triplet incentive strategy include:
[0027] The privacy budget must be greater than the lower threshold set by TP;
[0028] The compensation is positively correlated with the amount of valid data.
[0029] The total incentive amount for a single LMO shall not exceed a preset proportion of its data purchasing power.
[0030] Optionally, the hierarchical privacy budget constraints include:
[0031] TP sets a privacy budget floor for model uploads for each type of LMO through personalized incentive strategies;
[0032] Workers independently declare their data privacy budgets. During the LMO screening process, the total amount of valid data after differential privacy processing must not be less than the valid data amount requirement in the LMO's selected incentive strategy. The valid data amount indirectly constrains the Worker's privacy budget.
[0033] Optionally, the data noise-adding process includes:
[0034] The Worker layer handles the original data D. i Add a match Distributed noise;
[0035] The LMO layer adds Laplace noise to the model parameters in accordance with differential privacy requirements.
[0036] Technical Effects of this Invention: This invention discloses a layered incentive method for federated learning based on personalized privacy protection. It proposes an incentive strategy design driven by a Generative Diffusion Model (GDM) at the TP-LMOs layer. By combining the GDM with the multi-dimensional incentive strategy elements for optimization, and considering LMO privacy preferences and data contribution capabilities, personalized incentive strategies are generated for different LMO characteristics, alleviating information asymmetry and incentivizing LMOs to participate in and contribute high-quality models. Furthermore, an LMO-Workers layer worker selection scheme based on privacy preferences and data contribution is constructed. By comprehensively considering workers' personalized privacy preferences and data contribution levels, a reasonable bidding and selection method is designed to incentivize workers to participate in data provision. Simultaneously, it guides LMOs to select a more cost-effective set of workers under privacy constraints, thereby effectively supporting the completion of upper-layer incentive tasks. The design employs a hierarchical privacy budget constraint strategy, constructing a TP-LMOs-Workers privacy budget constraint chain: At the TP-LMOs layer, the TP sets a lower limit for the privacy budget of model uploads for each type of LMO through an incentive strategy set, restricting LMOs from performing differential privacy perturbations on local model parameters to protect privacy; at the LMO-Workers layer, Workers are allowed to independently declare their privacy budget for data collection, and the LMOs, through a screening mechanism, require that the total effective amount of data from the selected Workers after differential privacy processing is not less than the effective data amount requirement in the selected incentive strategy of the LMO, thus indirectly constraining the matching between the Worker's privacy budget and data contribution through the effective data amount. Attached Figure Description
[0037] The accompanying drawings, which form part of this application, are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an undue limitation of this application. In the drawings:
[0038] Figure 1 This is a flowchart illustrating a layered incentive method for federated learning based on personalized privacy protection, according to an embodiment of the present invention.
[0039] Figure 2 This is a schematic diagram illustrating the evolution of model accuracy with rounds in an embodiment of the present invention. Detailed Implementation
[0040] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.
[0041] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.
[0042] Contract theory, a core component of information economics and game theory, aims to address information asymmetry in principal-agent relationships. Its core principle is to design incentive-compatible contracts that ensure agents, who possess private information or engage in covert actions, align their behavior with the principal's goals. In modern information systems such as federated learning, edge computing, and data trading platforms, system designers often face information asymmetry challenges when interacting with stakeholders possessing private information (e.g., local devices, data providers). These challenges stem from a lack of knowledge about the stakeholders' true capabilities, costs, data quality, and privacy preferences. As a significant branch of game theory, contract theory constructs a contract system that incorporates type identification and incentive constraints. This system, while protecting the individual interests of stakeholders, guides them to actively choose appropriate contracts and truthfully disclose information, thereby optimizing the overall system efficiency.
[0043] The core objective of contract design is to maximize the principal's utility under conditions of information asymmetry, while ensuring the agent's voluntary participation and honest action. Contract design must satisfy the following two key constraints:
[0044] Incentive compatibility (IC) means that each agent will choose the contract that best suits its own type and will not misreport the type.
[0045] Individual rationality (IR) means that the utility of each agent after choosing a contract is no less than their reserved utility (which can be set to 0).
[0046] A typical contract design process can be divided into the following three steps:
[0047] Step 1: Model Definition and Utility Function:
[0048] First, it is necessary to clarify the roles of all parties involved in the system and their interaction methods, and construct a utility model. Assume the system consists of a principal with decision-making power (such as a platform) and several agents with private information (such as nodes and users). The type of each agent... This indicates that its private characteristics (such as data quality, computing power, and computational capabilities) follow a probability distribution within a certain range. It is usually assumed to be monotonic: higher-level types represent stronger capabilities and lower costs.
[0049] The platform designs a contract (t, r) for each type of agent, where:
[0050] t(θ): The amount of tasks (such as model training intensity, data upload volume, etc.) assigned by the platform to agents of type θ;
[0051] r(θ): The corresponding reward (such as money, points or resources).
[0052] An agent's utility function is typically defined as the reward they receive minus the cost required to complete the task: U(θ) = r(θ) - C(θ, t(θ));
[0053] C(θ, t(θ)) represents the cost function required for the agent to complete the task, which generally satisfies the condition that the stronger the ability, the slower the task cost increases.
[0054] The platform's goal is to maximize its utility derived from the actions of its agents. Its utility function is a function that represents the value it gains from the completion of tasks by agents minus the compensation paid to it.
[0055] V(θ)=V(t(θ))-r(θ);
[0056] Here, V(t(θ)) represents the value brought by the task, such as training accuracy or inference coverage.
[0057] Step 2: Setting Constraints:
[0058] After the model is established, in order to ensure that agents can still "honestly" choose contract items that match their own type in an environment of opaque information, the contract design must satisfy two key constraints:
[0059] Incentive Compatibility (IC) Constraint:
[0060]
[0061] This ensures that each agent obtains maximum utility from choosing contracts that match their own type, and prevents agents from misreporting their type to seek higher returns.
[0062] Individual Rationality (IR):
[0063] U(θ)≥0;
[0064] The utility of an agent's participation is guaranteed to be no less than their reserved utility (usually set to 0); otherwise, the agent will not choose to join the system.
[0065] Step 3: Optimize the objective and problem-solving:
[0066] After clarifying the utility model and constraints, the contract design problem is transformed into a constrained optimization problem. The goal is to maximize the expected revenue of the platform. Designers need to determine the optimal task allocation function and reward function while satisfying the IC and IR conditions.
[0067]
[0068] Where f(θ) is the agent type in the interval The probability density function on.
[0069] Combining the constraints in step 2, the optimal contract can be solved using mathematical methods (Lagrange multiplier method, KKT condition derivation, etc.). The final optimal contract has a self-selection property: high-type agents will choose contract terms with high tasks and high rewards, thus achieving a balance between maximizing system benefits and unifying agent incentives.
[0070] Generative Diffusion Models (GDMs) learn data distribution by simulating the process of progressively injecting noise (forward diffusion) and iteratively denoising (backward diffusion), ultimately achieving high-quality sample generation. Their core idea is to learn complex data distribution by simulating the dynamic process of data moving from ordered to disordered (noise addition) and then back to ordered (denoising).
[0071] Step 1: Forward Diffusion Process:
[0072] Forward diffusion gradually adds Gaussian noise to the data through a Markov chain, transforming the original data into random noise.
[0073] Given a data distribution x0~q(x0), a forward process is defined as a T-step Markov chain, where noise injection at each step is performed by a transfer kernel q(x0). t |x t-1 )describe:
[0074]
[0075]
[0076] By reparameterizing, let λ t :=1-ι t , Given input x0, sample a Gaussian vector ε ~ N(0, I) to obtain noise data x at any time. t :
[0077]
[0078] Then x tThis can be obtained from the following distribution:
[0079]
[0080] Step 2: Reverse Diffusion Process:
[0081] Backdiffusion recovers the data distribution step by step by training a neural network, a process that requires complex reconstruction using Markov chains. The neural network uses its acquired knowledge to predict noise at each step and then eliminates it. If the distribution q(x) is inverted... t-1 |x t Then, x can be obtained from the normal distribution N(0, I). t We begin by working backwards to create a new data point similar to the original dataset. However, calculating q(x) t-1 |x t This requires complex calculations involving data distribution, so p is used. θ Make an approximate estimate:
[0082] p θ (x t-1 |x t )=N(x t-1 μ θ (x t ,t),Σ θ (x t ,t));
[0083] Here, θ represents the model parameters. Applying the inverse formula over all step sizes (also called trajectories) allows us to trace back to the original data distribution. By doing this at each step size, the model can learn to predict specific features, such as the data mean and distribution at each time point. Therefore, from x... T The trajectory to x0 is represented as:
[0084]
[0085] After introducing conditional information g (such as environmental state) during the noise reduction process, p can be... θ (x t-1 |x t g) Modeled as a noise prediction model, the mean and covariance matrices of the reverse process are:
[0086] Σ θ (x t ,g,t)=ι t I;
[0087]
[0088] Then, the reverse diffusion chain with θ as a parameter performs the following sampling:
[0089]
[0090] By minimizing the noise prediction error to optimize the model, the original loss function can be simplified to:
[0091]
[0092] like Figure 1 As shown, this embodiment provides a layered incentive method for federated learning based on personalized privacy protection, including:
[0093] Step 1: TP-LMOs Layer: A method for constructing a multidimensional incentive policy set based on a generative diffusion model.
[0094] Currently, incentive schemes for federated learning between TP and LMOs face two key issues: First, LMOs may face privacy risks when uploading model parameters to TP, thus affecting their enthusiasm for participating in federated learning. Furthermore, different LMOs have varying degrees of privacy sensitivity; adopting a uniform incentive scheme could lead to unfair incentive distribution, thereby impacting overall collaboration efficiency. Second, due to differences in privacy preferences, LMOs may be unwilling to disclose the true scale of their data, or some LMOs may engage in dishonest reporting. This makes it difficult for TP to accurately grasp the effective amount of data contributed by each LMO, resulting in decreased accuracy of the global model built by TP without adequate real-world information support, and even failing to meet performance requirements. Contract theory is generally used to address such information asymmetry problems, but traditional contracts are not suitable for complex network scenarios with complex utility functions; that is, multidimensional conditions make traditional mathematical methods difficult to solve contract design problems or prone to getting trapped in local optima.
[0095] To address the two issues mentioned above, this invention employs a Generative Diffusion Model (GDM)-based approach to design personalized optimal incentive strategies, leveraging GDM to overcome the high dimensionality and complexity of the formulaic problem. This invention considers incentivizing each LMO to participate in federated learning by allowing each LMO to add personalized noise to its model parameters. Simultaneously, this invention formulates differentiated incentive strategies based on each LMO's personalized privacy preferences and the amount of data used for training, ensuring high availability of the local model while protecting the privacy of the LMO's local model parameters. Furthermore, GDM is an effective tool for solving multidimensional optimization problems, and the incentive strategy adopted in this invention needs to consider three parameters: each LMO's privacy preferences, the amount of data used for training, and the corresponding reward. To address the difficulty of solving traditional contract design methods in high-dimensional spaces, this invention proposes a multidimensional incentive strategy generation method based on the diffusion model. The incentive strategy designed by the TP is treated as a multidimensional vector, and optimized sampling is performed in its space through a diffusion-denoising mechanism, thereby generating a set of incentive strategies that maximizes the TP's utility under privacy budget constraints, which can be selected and executed by different LMOs. The TP explicitly agrees to upload the model privacy budget when generating incentive strategies, and this budget is a hard requirement for LMOs to fulfill their contracts. The specific process includes the following steps:
[0096] Step 1: Setting Incentive Strategy Goals (Defining Strategy Triads and Task Objectives):
[0097] In this invention, the TP aims to collect locally trained model parameters from multiple local model owner LMOs to improve the accuracy of the global model. Since different LMOs differ in privacy sensitivity, training capabilities, and resource costs, the TP struggles to accurately perceive their true type. Therefore, a multi-dimensional incentive strategy that satisfies incentive constraints needs to be designed to guide each LMO to choose the most suitable strategy for itself.
[0098] Step 1.1: LMO Private Type Modeling:
[0099] Each LMO has an unobservable private type parameter: θ m This represents LMO (Local Mobility) data purchasing power; a higher value indicates a greater willingness to purchase higher-quality data and lower costs associated with data acquisition. σ n This represents the LMO privacy preference level, with higher values indicating a greater concern for privacy. These types are arranged in non-decreasing order across all dimensions, and LMOs are distinguished based on these two types. Each LMO type is defined as a two-dimensional type pair:
[0100] (θ m , σ n ), m=1,2,...,M; n=1,2,...,N;
[0101] Step 1.2: Incentive Strategy Triple Design:
[0102] TP is of type (θ) m , σ n The LMO design incentive strategy is defined as follows:
[0103] ρ m,n =(l m,n ,∈ m,n ,r m,n );
[0104] Where: l m,n This indicates the effective amount of data required for LMO to train the local model. This data needs to be obtained by LMO from the lower-level worker, which is equivalent to the number of samples that can actually contribute useful information during training, even after adding differential privacy noise. m,n This represents the differential privacy budget added by LMO when uploading the model. This value affects the noise level of the model; a larger value means less noise and higher model accuracy, but also a greater risk of privacy leakage. m,n This represents the payment made by the TP to this type of LMO, compensating for its expenses such as purchasing data, local training, and privacy losses. This triple determines the LMO's training capability (through the amount of effective data), privacy leakage risk (through the privacy budget), and incentive strength (through the reward amount).
[0105] Step 1.3: Modeling Objective:
[0106] The goal of TP is to generate incentive strategies that satisfy participation and authenticity, enabling LMOs to participate voluntarily (individual rationality, IR), LMOs to choose strategies that suit them (incentive compatibility, IC), and TP to achieve maximum utility under budget constraints.
[0107] Step 2: Utility Modeling of TP and LMO (Objective Function and Constraints):
[0108] Step 2.1: Utility function of TP:
[0109] When the type is (θ) m , σ n The LMO selection type is (θ) p , σ q LMO corresponding strategy (l) p,q ,∈ p,q ,r p,q When uploading the model, the utility TP gains is:
[0110]
[0111] Among them, model returns The calculation formula is:
[0112]
[0113] Where δ and α are factors that determine the impact of model accuracy on the TP (transfer rate) of federated learning servers based on data size; γ measures the strength of the privacy budget's impact on model quality; log(1+α*l) p,q This indicates that increasing the amount of effective data can improve model accuracy, but there is diminishing marginal returns. This indicates the accuracy reduction caused by adding noise to the uploaded local model.
[0114] When the type is (θ) m , σ n When the LMO selects the corresponding policy type, the utility of TP is:
[0115]
[0116] Overall utility of TP:
[0117]
[0118] Step 2.2: LMO's utility function:
[0119] After accepting the incentive strategy, the utility of LMO is:
[0120]
[0121] Among them, c pur Indicates the unit cost of purchasing effective data; c com Indicates computation and communication costs; σ n *∈ p,q This represents the cost of privacy. θ m This indicates LMO's data purchasing capability; a higher value indicates a greater willingness to purchase higher-quality data and lower costs associated with data acquisition. σ n This indicates the LMO privacy preference level; a higher value indicates a greater concern for privacy.
[0122] Step 2.3: Incentive and Constraint Conditions
[0123] For the incentive strategy to operate stably, the following must be met:
[0124] To ensure that LMO selects a strategy (l m,n ,∈ m,n ,r m,n To obtain non-negative utility, the individual rationality constraint (IR) must be satisfied:
[0125] The lower bound of utility;
[0126] To ensure that LMO selects a strategy (l m,n ,∈m,n ,r m,n To achieve maximum utility using strategies other than those mentioned above, incentive compatibility (IC) must be satisfied.
[0127]
[0128] Step 3: Transform the incentive strategy design problem into a GDM optimization problem:
[0129] Since traditional contract design methods are difficult to solve in high-dimensional spaces and are prone to getting trapped in local optima, this invention transforms the incentive strategy design task into a strategy vector parameter generation problem, and models and optimizes it using a generative diffusion model (GDM).
[0130] Step 3.1: Modeling the policy vector space:
[0131] Each incentive policy can be viewed as a vector, and the triples constitute the policy space:
[0132] ρ=(l m,n ,∈ m,n ,r m,m )∈R 3 ;
[0133] Step 3.2: GDM Training Process:
[0134] GDM employs an iterative forward diffusion process based on initial input data, gradually introducing Gaussian noise. Then, GDM performs a reverse diffusion process through a denoising network, which iteratively approximates the true samples, represented as ρ ~ Q(ρ), through a series of prediction steps, where Q(ρ) represents the original data distribution. Subsequently, the denoising network is trained to reverse the noise process and recover the data and its content, thereby promoting the generation of new data.
[0135] Step 3.2.1 Forward diffusion stage (adding noise):
[0136] During the training phase, the system first processes the initial samples. A "perturbation" process is performed. This process gradually degenerates the excitation policy vector into an approximately pure noise distribution by progressively adding standard Gaussian noise. Considering ρ0~Q(ρ0), the forward process in GDM can be precisely represented as a Markov process involving K steps. During the forward diffusion process, Gaussian noise is applied to the initial sample ρ0, generating a series of samples (ρ1,...,ρ...). K If the forward diffusion process of GDM is then described as follows:
[0137]
[0138] Among them, ι k∈(0,1) represents the preset diffusion intensity; k represents the diffusion step number; the sample at the k-th step follows a Gaussian distribution with a mean of μ. k The variance is Σ k I is the identity matrix, indicating that each dimension has the same standard deviation.
[0139] Step 3.2.2 Reverse diffusion stage:
[0140] The core of this stage is a neural network predictor used at each step to predict the noise component in the current incentive policy and adjust the policy parameters accordingly. In other words, it uses a trained neural network to progressively predict noise from Gaussian noise samples ρ. k Recover the incentive strategy vector ρ0.
[0141] The diffusion model network is denoted as π. ω (ρ|S), using weights ω to map the environmental state to the incentive strategy design. π ω The objective of (ρ|S) is to output a deterministic incentive policy that maximizes the expected cumulative reward over a series of time steps. We express the incentive policy design through the inverse process of the conditional diffusion model as follows:
[0142]
[0143] in,
[0144] P ω (ρ k-1 |ρ k ,S)=N(ρ k μ ω (ρ k ,S,k),Σ ω (ρ k ,S,k));
[0145]
[0146] Σ ω (ρ k ,S,k)=ι k I;
[0147] In incentive strategy modeling, the environment includes various factors that influence the optimal design of the strategy, defined as S={L,M,N,∈ min ,R,(θ1...θ M ),(σ1...σ N )}, L represents the number of LMO participants in the current round, M represents the number of LMO categories based on privacy preferences, N represents the number of categories based on data capabilities, ∈ min Let ε represent the minimum privacy budget of the incentive strategy set defined by TP, and R represent the total incentive budget. In the formula... ωFor the incentive strategy generation network, λ k :=1-ι k , Then, the reverse diffusion chain with ω as a parameter performs the following sampling:
[0148]
[0149] Therefore, the incentive policy π will be trained in a complex, high-dimensional environment S. ω The task is effectively transferred to the incentive policy generation network ε ω Above. To train ε ω Introducing incentive strategies for quality network Q v It maps an environment-incentive policy pair (i.e., {S, ρ}) to a value representing the expected cumulative reward if the LMO chooses an incentive policy from the current state and follows that policy. Therefore, the optimal incentive policy design is the policy that maximizes the expected cumulative utility of the TP, which can be obtained through the following formula:
[0150]
[0151] The incentive strategy quality network employs traditional learning methods, utilizing double Q-learning to minimize the Bellman operator. A structure is constructed... and Two networks, and and π ω′ Three target networks. The network parameters v1 and v2 are then optimized by minimizing the following objectives:
[0152]
[0153] It is the discount factor. The incentive strategy design algorithm uses denoising techniques to generate the optimal strategy set. Subsequently, exploration noise is introduced into the strategy design and implemented to accumulate exploration experience.
[0154] Step 4: TP generates a multidimensional optimal incentive strategy set and determines the privacy budget cap for each local model.
[0155] After completing step 3, TP has constructed a generative network ε that can generate a multi-dimensional set of incentive strategies based on the current environmental state using a generative diffusion model. ω .
[0156] In this step, TP will be based on the trained GDM, with the current FL task's environment state S={L,M,N,∈ min ,R,(θ1...θ M ),(σ1...σ NAs input, a set of optimal incentive strategies for different types of LMOs is generated and distributed to each LMO for selection.
[0157] To ensure overall model quality, TP controls the privacy protection level of the entire system. During incentive policy generation, a privacy budget lower bound (∈) is set for each LMO to be adhered to when uploading the model for the current task round. m,n In other words, LMO cannot add noise to the local model that exceeds the set limit.
[0158] Step 2: LMO selects an appropriate incentive strategy and determines the local model's privacy budget constraint and effective data volume constraint:
[0159] After generating the incentive strategy set, the TP will publicly release the optimal incentive strategy set to all LMOs. Upon receiving the strategy set, each LMO will select the strategy that best suits its own conditions to maximize its utility, based on its own private characteristics (such as privacy preferences and data capabilities).
[0160] After selecting an incentive strategy, the LMO submits its response intention to the TP. Based on the received response information, the TP confirms the binding relationship between the LMO and the selected strategy, formally completing the signing process. Subsequently, the LMO will complete data collection, local training, and model uploading according to the requirements of the corresponding strategy. After accepting the model, the TP will pay the corresponding incentive compensation according to the incentive strategy agreement.
[0161] After completing data collection and local training, LMO must use data exceeding ∈ m,n The privacy budget applies differential privacy processing (usually adding noise) to the uploaded local model parameters, meaning that noise cannot exceed the constraints. Simultaneously, LMO, based on the TP incentive strategy, requires the collection of a certain amount of valid data. m,n The amount of training data is related to the amount of raw data collected by each worker and privacy preferences.
[0162] Step 3: LMO – Workers Layer: Construct a worker selection scheme and limit the privacy perturbation level of each worker based on the amount of effective data.
[0163] In federated learning architectures, the Local Model Training Executor (LMO), acting as the local model training executor, needs to complete a certain amount of local training based on the incentive strategy provided by the Terminal Processor (TP). However, LMOs typically have limited data of their own and must purchase data from Workers to complete the training task. This process presents the following challenges: While the addition of Workers alleviates the data shortage problem, their data contribution behavior is significantly affected by privacy risks; Workers may refuse to provide data due to concerns about data leakage. Furthermore, since different Workers have varying sensitivities to privacy protection, a uniform incentive scheme is difficult to implement. If the incentive cannot reasonably compensate for privacy losses, Workers may reduce data quality by adding excessive noise or directly withdraw from the federated learning task, thereby affecting the LMO's local model training performance and further weakening the overall model performance. To achieve precise matching between each Worker's heterogeneous privacy preferences and data contribution capabilities, and to strike a balance between incentive fairness and system efficiency, a reverse selection scheme with personalized incentive capabilities becomes a necessary choice.
[0164] To address the aforementioned issues, this invention, based on the TP-LMOs incentive strategy design, proposes a privacy-preserving reverse screening incentive method for the LMO-Workers layer. Specifically, data procurement is modeled as a matching process involving LMO initiating data purchase tasks, Workers voluntarily declaring their participation willingness, and LMO reverse screening for high-performance, cost-effective Workers. Each Worker submits three parameters based on their privacy preferences and data collection capabilities: data volume, privacy budget, and compensation requirements. Based on the information submitted by each Worker, LMO calculates the effective data volume, evaluates the cost-effectiveness, and selects a set of Workers that meet the requirements of the TP incentive strategy. Under this scheme, LMO can dynamically balance data quality and privacy protection, achieving local training data procurement at a lower cost and supporting the coordinated optimization of the overall FL system's performance and privacy objectives. (θ) m , σ n Taking LMO as an example, the specific process includes the following steps:
[0165] Step 1: Calculation method for effective data volume:
[0166] The effective data volume is typically related to the original data volume and the noise variance, which in turn is related to the privacy budget. In differential privacy, the noise variance is inversely proportional to the square of the privacy budget. Therefore, the effective data volume of each worker uploading perturbed original data is:
[0167]
[0168] c represents the task-related noise figure (a constant reflecting the sensitivity of the task and the differential privacy mechanism used; in many studies it is taken as 1 or other empirical constants); di Indicates the amount of raw data provided by Worker i; ∈ i This represents Worker i's privacy budget.
[0169] Step 2: Utility Function Design:
[0170] When the type is (θ) m , σ n After an LMO accepts an incentive strategy corresponding to its type, its objective for this round of tasks is to use at least l m,n Train the model with valid data and use data with a value higher than ∈ m,n The privacy budget is used to add noise to the model before uploading it to the TP. To achieve this, the LMO will select from the set of Workers who have applied to participate in selling data. The goal of the selection strategy is to maximize its own utility and minimize the data procurement cost while satisfying the corresponding incentive policy constraints and incentivizing Worker participation.
[0171] Workers submit bids based on their own circumstances, including the amount of data they are willing to provide (d). i With an acceptable privacy budget i And expect to receive payment from LMO. i The utility function of the Worker is:
[0172]
[0173] in, This represents the unit data acquisition cost of Worker i; This represents the unit privacy leakage cost of Worker i. The participation constraint of Worker i is the lower bound of the utility of i, which is generally taken as 0.
[0174] After LMO selects the corresponding incentive strategy, the utility is:
[0175]
[0176] Among them, the unit effective data procurement cost c pur Average cost derived from the bids submitted by the winning Workers.
[0177] The specific formula is as follows:
[0178] Among them, W m,n The type is (θ) m , σ n The set of LMO's winning Workers; This indicates the total bid price for the winning Worker.
[0179] Step 3: LMO initiates a screening task:
[0180] The type is (θ) m , σ n LMOs accept the incentive strategy (l) set by the TP. m,n ,∈ m,n r m,n After that, proceed with the local data procurement process. The procured data must meet the requirement that the total valid data volume is not less than 1. m,n However, a uniform privacy budget is not mandated for all Workers, allowing them to determine their own privacy budget based on their preferences. The LMO broadcasts the screening task to its set of Workers and specifies that each Worker must submit a bid triple: b i =(d i ,∈ i ,p i ).
[0181] Step 4: Workers submit participation applications:
[0182] Each Worker submits a request tripartite to the LMO based on its own resources and privacy concerns: b i =(d i ,∈ i p i The goal of a Worker is to maximize its probability of winning a bid while ensuring that its own utility is non-negative.
[0183] Step 5: LMO calculates cost-effectiveness and selects successful bidders:
[0184] When selecting workers in an LMO (Local Management Optimization) process, one cannot simply look at the amount of raw data (d). i Instead, each worker must adhere to the privacy budget ∈ i Effective data volume The effective data volume formula reflects the constraints on the Worker privacy budget and the amount of raw data. For each Worker i, its effective data volume is calculated based on its application information:
[0185]
[0186] Then, evaluate the unit effective data procurement cost for each Worker i:
[0187]
[0188] in, The smaller the value, the lower the cost per unit of valid data.
[0189] Filtering logic: By Sort the values from smallest to largest (lower cost per unit of effective data is better); select Workers in sequence to form a set W. m,n And must meet If multiple combinations meet the criteria, the one with the lowest total reward can be chosen.
[0190] Step 4: The Worker personalizes the data according to the application information and uploads it to the LMO:
[0191] Worker i according to the privacy budget in the application ∈ i For a data volume of d i Differential privacy noise reduction processing is applied to the data:
[0192]
[0193] in After adding noise, the data is uploaded to the LMO, and Worker i receives a reward p. i .
[0194] Step 5: LMO completes local model training based on perturbed data and uploads the perturbed model.
[0195] LMO merges all winning worker data to construct a training set:
[0196]
[0197] Use this dataset for local model training, and follow the ∈ in the incentive policy. m,n After adding noise to the model parameters, upload them to TP to complete this round of tasks.
[0198] The TP receives the noise-adding model parameters uploaded by each LMO, evaluates and aggregates them to obtain a global model.
[0199] This experiment, based on a user privacy-preserving federated learning hierarchical incentive framework (TP–LMO–Worker), uses real datasets (EMNIST and Fashion-MNIST) for numerical simulation to evaluate the impact of four methods on system performance. The system architecture includes one Task Publisher (TP), five Local Model Organizers (LMOs), and 100 terminal participants (Workers), with each LMO managing 20 Workers. The experiment compares the performance of four schemes—uniform pricing, discriminatory pricing, hierarchical incentive mechanism (HUMA), and the method of this invention—in terms of model accuracy. The simulation uses a CNN model for federated communication, with training parameters including a learning rate of 0.01, a batch size of 32, and non-IID data partitioning (Dirichlet distribution α = 0.5). The experiment focuses on simulating the accuracy variation curves under different strategies with training epochs, such as... Figure 2 As shown.
[0200] Uniform Pricing: TP provides the same unit price for data to all LMOs, and LMOs also purchase worker data at a uniform price standard, without considering the personalized characteristics of LMOs or workers (such as data capabilities or privacy preferences) to differentiate pricing.
[0201] Discriminatory pricing: TPs set different prices per unit of data provided by different types of LMOs. LMOs, in turn, set different prices per unit of data provided by different types of workers. In practice, this pricing method faces difficulties in implementation because TPs cannot accurately know the private attributes of each LMO, and LMOs cannot accurately know the private attributes of each worker, such as data collection capabilities and data quality.
[0202] Tiered Incentive Pricing (HUMA): HUMA pricing differs from traditional methods. The TP (Targeting Player) uses Stackelberg game theory to determine the total payout τ to the LMO (Leadership Machine). Based on its own circumstances and the outcome of its game with the TP, the LMO applies contract theory, adhering to individual rationality and incentive compatibility constraints, to formulate differentiated rewards and data volume requirements for different Workers, thereby achieving pricing for Worker data.
[0203] An application example of this invention is as follows:
[0204] 1. Scene Background
[0205] In a provincial-level smart healthcare system, a regional medical data platform aims to collaborate with several tertiary hospitals and community health service centers within the province to jointly train a lung disease recognition model based on X-ray images. The platform does not directly access the raw data; instead, it coordinates training through the tertiary hospitals and their affiliated community healthcare sites to ensure model effectiveness while respecting patient privacy.
[0206] Each community healthcare site stores local patients' lung X-ray images. This data is highly privacy-sensitive and varies in terms of equipment performance, data volume, and willingness to protect privacy. Directly forcing sites to upload perturbation models or anonymized data could easily reduce participation and even cause training failure. To address this issue, the platform introduces personalized privacy protection methods and a tiered incentive selection strategy to improve training efficiency and privacy security.
[0207] TP (Task Issuer): A regional healthcare platform that sets task objectives and reward rules, and aggregates global models;
[0208] LMO (Local Model Organizer): Tertiary hospitals organize their affiliated community clinics to participate in training;
[0209] Worker (Edge Terminal): Community healthcare sites or mobile diagnostic devices that hold raw image data.
[0210] 2. Implementation Cases
[0211] Step 1: The platform develops participation plans at multiple levels and distributes them to various tertiary hospitals.
[0212] The regional healthcare platform sets the model accuracy target for this task and formulates various levels of participation schemes through a generative diffusion model. For example, the Level 1 scheme requires uploading approximately 10,000 perturbation images of valid data, with a corresponding model privacy budget of no less than 1.2 and a maximum reward of 20,000 yuan; the Level 2 scheme requires uploading approximately 8,000 perturbation images of valid data, with a corresponding model privacy budget of no less than 1.0 and a maximum reward of 17,000 yuan; and the Level 3 scheme requires uploading approximately 6,000 perturbation images of valid data, with a corresponding model privacy budget of no less than 0.8 and a maximum reward of 12,000 yuan. These schemes specify the scale of valid training data that each tertiary hospital must submit and stipulate that a certain degree of differential privacy perturbation must be added before uploading the local model to protect the data source. Each scheme corresponds to different reward incentive standards, which are uniformly announced by the platform. Tertiary hospitals can independently choose the most suitable scheme based on their own resources and the capabilities of their subordinate medical sites and submit it for confirmation to the platform. For example, a People's Hospital in Chengdu has strong data capabilities and relatively lenient privacy controls, so it automatically recommends the Level 1 plan; while a People's Hospital in Liangshan Prefecture has a high privacy preference, so it recommends the Level 3 plan with a low privacy intervention budget.
[0213] Step 2: Tertiary hospitals issue task notifications to medical sites within their jurisdiction.
[0214] After selecting a solution, the tertiary hospital sends a task announcement to the community health stations under its jurisdiction. The announcement specifies the requirements for participation, including the required data volume, the acceptable level of privacy protection (i.e., the degree of model perturbation), and the compensation incentives each station can apply for. Community health stations decide whether to participate based on the number of images they collect, the sensitivity of their data, and their computing power. They then fill in information such as their willingness to provide data, their acceptable level of perturbation, and their desired compensation. For example, a community health service center in Jinjiang District, Chengdu, expressed its willingness to contribute 400 images, upload a perturbation model with a privacy budget of ∈=1.2, and quoted 3 yuan per image.
[0215] Step 3: Tertiary hospitals select medical sites to form a training alliance
[0216] After receiving responses from the sites, the tertiary hospital evaluated the submissions from all sites. Priority was given to sites with high-quality data, willingness to provide training results within a controllable privacy budget, and reasonable pricing to form the training dataset. This process ensured sufficient training data was obtained without violating the platform's privacy control limits. Selected sites received confirmation of eligibility and access to upload data. For example, a hospital in the High-tech Zone selected four sites with large datasets and reasonable privacy budgets based on their quotes, combining them to provide approximately 10,000 images, sufficient for the Level 1 solution.
[0217] Step 4: The site completes the disturbance and submits the disturbance data.
[0218] Selected sites will receive eligibility confirmation and a data upload channel. After confirmation, each community healthcare site will perturb its local data according to the privacy budget specified in the application before uploading the perturbed data to the tertiary hospital.
[0219] Step 5: Tertiary hospitals complete local model training and perturbation and upload the results to the platform.
[0220] Tertiary hospitals aggregate the perturbed data submitted by all sites and use the perturbed X-ray images to train the model. After training, the tertiary hospitals perturb the model parameters or gradients according to their committed privacy budgets, and then submit the perturbed model results to the platform. The entire process does not require uploading the original images, greatly reducing the risk of patient privacy breaches.
[0221] Step 6: The platform completes the aggregation of perturbation models.
[0222] After receiving all the perturbation models submitted by hospitals, the platform aggregates and integrates them into a global model. This global model is then distributed for iterative training. Once the training accuracy reaches a set standard, the platform will distribute corresponding rewards to each hospital according to the incentive amount specified in their task completion plan. Tertiary hospitals can further distribute rewards downwards based on the unit price specified in their site application.
[0223] The above are merely preferred embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A federated learning hierarchical incentive method based on personalized privacy protection, characterized in that, include: The task publisher TP uses a generative diffusion model to jointly analyze the privacy preferences and data purchasing power of the local model owner LMO; Generate a set of triplet incentive policies that include the effective amount of data, privacy budget, and reward; Each LMO selects a corresponding incentive strategy based on its own type, and filters worker combinations that meet the effective data volume requirements based on incentive strategy constraints. The Worker adds noise to the raw data according to the declared privacy budget and then submits the corresponding LMO; LMO aggregates noisy data to train a local model, perturbs the model parameters according to the privacy budget constraint in the selected incentive strategy, and then uploads it to TP. The Worker screening process includes: Calculate the cost per unit of effective data: ; in, Cost per unit of effective data For the payment expected to be received from the LMO, Effective data volume; according to Select the Worker group by sorting values in ascending order; The amount of valid data ; in, The amount of raw data provided to Worker i For Worker i's privacy budget, This represents the noise figure related to the task.
2. The federated learning hierarchical incentive method based on personalized privacy protection as described in claim 1, characterized in that, The training process of the generative diffusion model includes: During the forward diffusion phase, Gaussian noise is gradually added to the incentive strategy vector; The optimal incentive strategy distribution is recovered through a neural network during the back-diffusion phase. The environment state input includes the number of LMO participants, the number of categories, the minimum privacy budget and the total incentive budget in the incentive strategy.
3. The federated learning hierarchical incentive method based on personalized privacy protection as described in claim 1, characterized in that, The generation conditions for the triplet incentive strategy include: The privacy budget must be greater than the lower threshold set by TP; The compensation is positively correlated with the amount of valid data. The total incentive amount for a single LMO shall not exceed a preset proportion of its data purchasing power.
4. The federated learning hierarchical incentive method based on personalized privacy protection as described in claim 1, characterized in that, The privacy budget constraints include: TP sets a privacy budget floor for model uploads for each type of LMO through personalized incentive strategies; Workers independently declare their data privacy budgets. During the LMO screening process, the total amount of valid data after differential privacy processing must not be less than the valid data requirement in the LMO's selected incentive strategy. The valid data amount indirectly constrains the Worker's privacy budget.
5. The federated learning hierarchical incentive method based on personalized privacy protection as described in claim 1, characterized in that, The data noise addition process includes: The Worker layer processes the raw data. Add a match Distributed noise; The LMO layer adds Laplace noise to the model parameters in accordance with differential privacy requirements.