A user privacy regression modeling method fusing multi-platform data

By combining a hierarchical attention mechanism with an exponential decay function to allocate a personalized privacy budget, this approach addresses the problem in existing privacy modeling methods where important platforms and highly sensitive features are forced to endure excessive noise. It achieves dynamic matching between data importance and protection strength, thereby improving the model's accuracy and privacy protection level.

CN121365381BActive Publication Date: 2026-06-23BEIJING HONGTU XINDA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING HONGTU XINDA TECH CO LTD
Filing Date
2025-12-04
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing privacy modeling methods cannot differentiate based on differences in data quality or feature sensitivity across different platforms. This results in important platforms and highly sensitive features being subjected to excessive noise, while areas with lower noise levels suffer from insufficient privacy protection. The overall model utility and privacy protection level are traded off in a suboptimal way. Furthermore, attention mechanisms and privacy budget allocation are independent of each other, making it impossible to dynamically match data importance with protection strength.

Method used

A hierarchical attention mechanism is adopted to dynamically allocate the privacy budget. Cross-platform balancing is achieved through a global privacy budget pool. Personalized privacy budget allocation is realized by combining an exponential decay function and a global reallocation mechanism. Adaptive noise is added to distributed regression training, and a global regression model is formed through secure aggregation. Attention weights and privacy budget are adjusted in reverse.

Benefits of technology

It achieves personalized protection of important platforms and sensitive features, improves the model's prediction accuracy and privacy protection level, optimizes the overall utility-privacy trade-off, and enhances the fairness of data contributions and training stability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121365381B_ABST
    Figure CN121365381B_ABST
Patent Text Reader

Abstract

The application discloses a user privacy regression modeling method fusing multi-platform data, relates to the technical field of privacy modeling, and comprises the following steps: pre-processing user behavior data by adopting a hierarchical attention mechanism, outputting a feature vector, dynamically allocating a privacy budget by adopting an exponential decay function according to attention weights generated by the hierarchical attention mechanism, balancing cross-platforms by a global privacy budget pool to obtain a personalized privacy budget allocation scheme, performing distributed regression training locally at each data source based on the feature vector and the personalized privacy budget allocation scheme, adding noise adaptive to the privacy budget to a gradient, and forming a global regression model by safe aggregation; and the application realizes real-time collaborative adjustment of attention weights and privacy noise intensity by collecting local generalization errors of each platform into a global error signal and reversely propagating the global error signal to a closed loop of an attention layer and an exponential decay function.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of privacy modeling technology, and in particular to a user privacy regression modeling method that integrates data from multiple platforms. Background Technology

[0002] With the rapid development of the digital economy, user behavior data exhibits significant multi-platform distribution characteristics, containing rich individual profile information. In regression modeling scenarios such as credit assessment, precision marketing, and health risk prediction, data from a single platform often suffers from insufficient sample coverage and limited feature dimensions, making it difficult to meet the demand for high-precision continuous value prediction. To overcome the limitations of data silos, federated learning, as a typical distributed machine learning paradigm, has been widely studied. Its core idea is to achieve multi-party collaborative training through the secure aggregation of model parameters or gradients without directly exchanging raw data. Theoretically, differential privacy mechanisms can be combined to add noise to the gradients to provide strict mathematical privacy guarantees. In recent years, horizontal federated learning, vertical federated learning, and secure multi-party computation technologies based on secret sharing or homomorphic encryption have gradually matured and achieved certain application results in fields such as financial risk control and medical auxiliary diagnosis.

[0003] However, existing privacy modeling methods still have room for improvement: First, most solutions adopt a uniform privacy budget allocation strategy for all participants or all features, which cannot differentiate based on differences in data quality or feature sensitivity across different platforms. This results in important platforms and highly sensitive features being forced to bear excessive noise, while areas with lower noise may experience insufficient privacy protection, ultimately leading to a suboptimal trade-off between overall model utility and privacy protection level. Second, existing privacy modeling methods typically use attention mechanisms only for feature selection or internal model representation learning, without establishing a closed-loop feedback between them and privacy budget allocation. Attention weights and privacy noise intensity are adjusted independently, making it difficult to achieve a dynamic match between data importance and protection strength. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides a user privacy regression modeling method that integrates data from multiple platforms to solve the problem of being unable to distinguish between different platforms based on differences in data quality or sensitivity of features.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] This invention provides a user privacy regression modeling method that integrates data from multiple platforms, comprising:

[0008] User behavior data is collected from multiple data sources, and a hierarchical attention mechanism is used to preprocess the user behavior data to output a feature vector with uniform dimensions.

[0009] Based on the attention weights generated by the hierarchical attention mechanism, a privacy budget is dynamically allocated using an exponential decay function, and a cross-platform balance is achieved through a global privacy budget pool to obtain a personalized privacy budget allocation scheme.

[0010] Based on the feature vector and the personalized privacy budget allocation scheme, distributed regression training is performed locally on each data source. After adding noise that is adaptive to the privacy budget to the gradient, a global regression model is formed through secure aggregation.

[0011] The global regression model is iteratively optimized, and the attention weights and privacy budget allocation are adjusted in reverse based on the generalization error of local cross-validation on each platform to obtain the user privacy regression model and evaluation report.

[0012] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the output feature vector with unified dimensions is specifically:

[0013] User behavior data from different data sources are input into the platform-level attention layer. The global importance score of each platform is calculated and normalized using learnable weight vectors to obtain the platform attention weights.

[0014] All features from each data source are input into the feature-level attention layer. The contribution score of each feature to the regression task is calculated and normalized to obtain the feature attention weights.

[0015] The feature vectors of each platform are weighted and fused using platform attention weights, and then the features within each data point are weighted a second time using feature attention weights to obtain preliminary fused features.

[0016] Batch standardization and median missing value imputation are performed on the preliminary fused features to output a feature vector with uniform dimensions.

[0017] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the method involves dynamically allocating the privacy budget using an exponential decay function based on the attention weights generated by the hierarchical attention mechanism, and balancing it across platforms through a global privacy budget pool to obtain a personalized privacy budget allocation scheme. Specifically:

[0018] Arrange the platform attention weights in order of data source, then append the feature attention weights in each data source in order of feature to the platform attention weights, and concatenate them into a joint attention vector;

[0019] Each element in the joint attention vector is mapped by applying exponential decay to obtain the corresponding local privacy budget, which is then aggregated into the global privacy budget pool.

[0020] Based on the proportion of the remaining total budget in the global privacy budget pool, the privacy budgets of each platform are redistributed and adjusted to output a personalized privacy budget allocation scheme.

[0021] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the step of performing distributed regression training locally on each data source and adding noise adaptive to the privacy budget to the gradient specifically involves:

[0022] Before distributed regression training begins, a global regression model with the same structure as the local model of each data source is initialized. Each data source uses a multi-layer neural network and receives the current global regression model parameters as the initial parameters of its local regression model.

[0023] Each data source uses local user behavior data and feature vectors to perform forward propagation and loss calculation to obtain local gradients. Based on the personalized privacy budget corresponding to each data point in the current batch, the noise intensity is determined, noise is added to the local gradients, and gradient clipping is performed to obtain clipped and noisy gradients.

[0024] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the step of forming a global regression model through secure aggregation specifically involves:

[0025] The cropped and noisy gradients uploaded from each data source are divided into multiple secret shares and distributed to multiple aggregation nodes using a secret sharing method.

[0026] Each aggregation node performs a secure summation operation in an encrypted state on its secret share of the same location to obtain the encrypted aggregation gradient share;

[0027] Multiple aggregation nodes collaborate to reconstruct the plaintext average gradient corresponding to the encrypted aggregate gradient share, and update the parameters of the current global regression model to generate a new round of global regression model.

[0028] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the step of iteratively optimizing the global regression model and adjusting the attention weights and privacy budget allocation in reverse based on the generalization error of local cross-validation on each platform to obtain the user privacy regression model is as follows:

[0029] After each new round of global regression model is released, each platform uses the local validation set to calculate the regression mean square error as the generalization error, and summarizes the generalization errors of all platforms into a global error signal.

[0030] By backpropagating the global error signal, the platform attention weights and feature attention weights in the hierarchical attention mechanism are updated synchronously, and the decay coefficient of the exponential decay function is also updated.

[0031] Training stops when the global validation loss no longer decreases after N consecutive rounds, and the finally converged user privacy regression model is output.

[0032] As a preferred embodiment of the user privacy regression modeling method that integrates multi-platform data as described in this invention, the evaluation report includes the mean squared error of the finally converged user privacy regression model on the test sets of each platform, the accuracy gap with the model without privacy protection, the success rate of member inference attacks, and the total privacy budget consumed.

[0033] As a preferred embodiment of the user privacy regression modeling method that integrates multi-platform data according to the present invention, the step of adding noise to the local gradient and performing gradient clipping refers to performing L2 norm gradient clipping on the local gradient according to a preset gradient clipping threshold, and adding Gaussian noise of corresponding intensity to the clipped gradient.

[0034] As a preferred embodiment of the user privacy regression modeling method that integrates multi-platform data as described in this invention, the global privacy budget pool redistributes the unused remaining budget to the next round according to the attention weight ratio of the current round after each iteration.

[0035] As a preferred embodiment of the user privacy regression modeling method integrating multi-platform data described in this invention, the step of redistributing and adjusting the local privacy budgets across platforms according to the proportion of the remaining total budget in the global privacy budget pool, and outputting a personalized privacy budget allocation scheme for each data point, specifically includes:

[0036] The global redistribution coefficient is obtained by calculating the ratio of the current remaining budget in the global privacy budget pool to the sum of all local privacy budgets.

[0037] The adjusted local privacy budget is obtained by calculating the global reallocation coefficient with each local privacy budget.

[0038] Update the sum of all adjusted local privacy budgets to the current remaining total budget, and map the adjusted local privacy budgets back to each corresponding data point, outputting a personalized privacy budget allocation scheme.

[0039] The beneficial effects of this invention are as follows: By combining a hierarchical attention mechanism with an exponential decay function to achieve a personalized privacy budget dynamic allocation mechanism, and a global reallocation and cross-round reuse mechanism for the global privacy budget pool, this invention completely overcomes the suboptimal overall utility-privacy trade-off caused by the unified privacy budget allocation strategy commonly used in existing privacy modeling methods. This strategy forces important platforms and highly sensitive features to bear excessive noise while providing insufficient protection for low-sensitivity parts. At the same time, by aggregating the local generalization errors of each platform into a global error signal and backpropagating it to the closed-loop optimization of the attention layer and the exponential decay function, this invention achieves real-time coordinated adjustment of attention weights and privacy noise intensity. This fundamentally solves the shortcomings of existing methods where the attention mechanism is only used for feature representation and is independent of privacy budget allocation, and cannot dynamically match data importance and protection strength. Attached Figure Description

[0040] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 A flowchart for a user privacy regression modeling method that integrates data from multiple platforms.

[0042] Figure 2 Flowchart for dynamically allocating the privacy budget.

[0043] Figure 3 Flowchart for distributed regression training and secure aggregation.

[0044] Figure 4 The flowchart is optimized iteratively. Detailed Implementation

[0045] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0046] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0047] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0048] Reference Figures 1-4 As one embodiment of the present invention, this embodiment provides a user privacy regression modeling method that integrates data from multiple platforms, including the following steps:

[0049] S1: Collect user behavior data from multiple data sources, preprocess the user behavior data using a hierarchical attention mechanism, and output a feature vector with uniform dimensions.

[0050] S1.1: User behavior data from different data sources (such as social media platforms, e-commerce platforms, and mobile applications) enter the platform-level attention layer. The platform-level attention layer maintains a learnable weight vector for each data source. It averages all features from each data source to obtain an average feature vector, and multiplies this average feature vector with the learnable weight vector to obtain the raw global importance score. Softmax normalization is then performed on the raw global importance scores from all data sources to obtain the platform attention weight. User behavior data includes: clickstream features, transaction record features, social interaction features, time series features, and device and environment features.

[0051] All features within each data source enter the feature-level attention layer, which maintains a global query vector within the data source. For each single feature vector in the data source (decomposed from the original feature vector), the contribution raw score is obtained by multiplying it with the global query vector. Softmax normalization is performed on the contribution raw scores of all features within the data source to obtain the feature attention weights.

[0052] S1.2: The complete original feature vectors from each data source are first weighted and fused using platform attention weights to obtain a cross-platform fused feature vector, expressed as:

[0053] ;

[0054] in, This represents a cross-platform fusion feature vector. Indicates the index of the data source. Indicates the first Platform attention weights for each data source. Indicates the first The complete original feature vector of each data source is the original feature vector of each data source that retains all dimensions.

[0055] Within each data source, the original feature vector is weighted a second time using feature attention weights to obtain the... The refined feature vector of a single platform from a data source is expressed as follows:

[0056] ;

[0057] in, Indicates the first Refined feature vectors from a single platform based on multiple data sources. Indicates the first An index of the internal characteristics of a data source. Indicates the first The first data source Feature attention weights for each feature, Indicates the first The first data source The original feature vectors refer to the initial feature vectors of each data source after collection without any processing.

[0058] The refined feature vectors from all data sources on a single platform are concatenated to form a complete preliminary fusion feature. Batch standardization is then performed on the preliminary fusion feature to make the mean of each dimension 0 and the variance 1, resulting in a standardized feature vector. For missing values ​​in the standardized feature vector, the median value of the current dimension across all samples is used to fill in the missing values. After filling in the missing values, a complete and stable multi-platform fusion feature vector with uniform dimensions is obtained.

[0059] Preferably, compared to existing preprocessing methods that simply concatenate or average-weight data from multiple platforms, this method achieves differentiated importance perception between platforms and within features through a hierarchical attention mechanism. This allows important data sources and key features to receive higher expression weights during the fusion process, avoiding interference from irrelevant or noisy features on the regression task. At the same time, the combination of batch standardization and median imputation further improves the stability and integrity of feature distribution, thereby significantly improving the prediction accuracy and convergence speed of the subsequent privacy-preserving regression model, and achieving a better accuracy-privacy tradeoff under the same privacy budget.

[0060] S2: Based on the attention weights generated by the hierarchical attention mechanism, the privacy budget is dynamically allocated using an exponential decay function, and then balanced across platforms through a global privacy budget pool to obtain a personalized privacy budget allocation scheme.

[0061] S2.1: After all platform attention weights and feature attention weights have been collected, they are arranged and spliced ​​in the order of first all platform attention weights, then the feature attention weights within each platform, to form a complete one-dimensional joint attention vector.

[0062] Applying an exponential decay function to each element of the joint attention vector maps the attention weights such that larger attention weights correspond to smaller mapping results, thus obtaining the local privacy budget for each element, expressed as:

[0063] ;

[0064] in, The first term in the joint attention vector represents the... The final local privacy budget obtained by each element This represents the total remaining budget in the global privacy budget pool at the start of the current round. Denotes the base of the natural logarithm. This represents the attenuation control coefficient. The first term in the joint attention vector represents the... The original attention weight values ​​(ranging from 0 to 1) for each element.

[0065] It should also be noted that the process of determining the decay control coefficient is as follows: Before the start of each training round, a target compression ratio is set, representing the proportion of the local privacy budget that the element with the highest attention weight (i.e., the element with an attention weight equal to 1) is expected to receive out of the total budget for the current round; the decay control coefficient... This is equal to the target compression ratio taken as the natural logarithm and then the negative value. With this setting, when the attention weight of a certain element is exactly 1, its local privacy budget is exactly equal to the value of the current round's total budget multiplied by the target compression ratio, thus achieving the expected maximum attention element budget compression target.

[0066] S2.2: After all local privacy budgets are calculated, they are directly aggregated into the global privacy budget pool to form the local privacy budget set for the current round.

[0067] The ratio of the total remaining budget actually available in the current round in the global privacy budget pool to the sum of all the local privacy budgets just collected is calculated. This ratio is the global reallocation coefficient. The current global reallocation coefficient is multiplied by each local privacy budget to obtain the corresponding adjusted local privacy budget. After this multiplication adjustment, the sum of all local privacy budgets is precisely scaled to the total remaining budget of the current round, without exceeding the budget or leaving a gap. The adjusted local privacy budget maintains the same relative size relationship with the original local privacy budget, but is uniformly enlarged or reduced proportionally. This ensures that the differentiated protection strength brought by attention weight is preserved to the greatest extent while strictly adhering to the upper limit of the total remaining budget of the current round.

[0068] Each adjusted local privacy budget is mapped back to its corresponding platform or specific feature according to its original position in the joint attention vector, and then further mapped to the specific data point to which the current feature belongs, forming a personalized privacy budget allocation scheme for each data point.

[0069] It should also be noted that after the completion of this round of distributed regression training, the portion of the global privacy budget that is not actually consumed remains in the global privacy budget pool. When entering the next iteration, this unconsumed remaining budget is directly added to the corresponding local privacy budget of the next round according to the ratio of platform attention weight to feature attention weight recalculated in the next round, thereby realizing the dynamic reuse and cross-platform balance of the total privacy budget throughout the multi-round training process.

[0070] Preferably, compared to existing methods that uniformly allocate privacy budgets across all platforms and features or simply distribute them equally across platforms, this method achieves fine-grained dynamic privacy budget allocation through an exponential decay function combined with attention weights. This allows important platforms and sensitive features to automatically receive smaller privacy budgets (stronger protection), while less important parts retain more budget to maintain model utility. Combined with global redistribution of the global privacy budget pool and reuse of remaining budgets across rounds, this ensures that the total budget is strictly controlled while maximizing the utilization rate of the budget in each round. Ultimately, under the same total privacy budget constraint, this significantly improves the accuracy of the regression model, optimizes the accuracy-privacy trade-off, and enhances the fairness of data contributions and training stability across multiple platforms.

[0071] S3: Based on feature vectors and a personalized privacy budget allocation scheme, distributed regression training is performed locally on each data source. After adding noise that is adaptive to the privacy budget to the gradient, a global regression model is formed through secure aggregation.

[0072] Before the distributed regression training begins, a global regression model is initialized. The global regression model adopts the same multi-layer neural network structure as the local data source, and the initialized global regression model parameters are broadcast to each data source as the current global regression model parameters for the first round.

[0073] At the start of each training round, each data source receives the current global regression model parameters and copies them completely into the local multilayer neural network as the initial parameters of the local regression model.

[0074] Each data source uses locally held user behavior data and a uniformly dimensional feature vector obtained from S1 to perform forward propagation on the local regression model, calculating the regression loss function (mean squared error). Then, backpropagation is used to obtain the original local gradients of all samples in the current batch. The formula for calculating the regression loss function is:

[0075] ;

[0076] in, This represents the regression loss value calculated for the current batch using this data source. This indicates the number of samples included in the current batch. This indicates the sequence number of the sample in the current batch, from 1 to... , This indicates that the local regression model applies to the current batch. The predicted output value for each sample. Indicates the current batch number The true target value of each sample.

[0077] S3.2: Perform gradient clipping on the original local gradient: Set a uniform gradient clipping threshold. If the L2 norm of a sample gradient exceeds the gradient clipping threshold, then scale the current sample gradient until the L2 norm is exactly equal to the gradient clipping threshold.

[0078] It should also be noted that after the original local gradient calculation of all samples in the current batch is completed, the L2 norm of the gradient of all samples is collected, and the norm corresponding to the 95th percentile is selected as the gradient clipping threshold for this batch. This setting can clip extremely large gradients while retaining most of the normal gradient information, and no manual parameter tuning is required.

[0079] For each sample's clipped gradient, the noise intensity is determined based on the personalized privacy budget obtained in step S2.2 for the current sample. Gaussian noise with a mean of 0 is added to the clipped gradient. The noise standard deviation is inversely proportional to the personalized privacy budget and directly proportional to the gradient clipping threshold, thus obtaining the final clipped and noisy gradient.

[0080] S3.3: Each data source uploads its clipped and noisy gradients to the secure aggregation server. Upon receiving all clipped and noisy gradients from all data sources, the secure aggregation server performs a secret sharing operation on the gradient of each data source: For the clipped and noisy gradient vector of the k-th data source, the secure aggregation server selects a secret sharing threshold scheme (e.g., with a total of N aggregation nodes, at least t nodes are needed for recovery). It uses additive secret sharing to divide each dimension of the noisy gradient vector into N secret shares, ensuring that any combination with fewer than t secret shares cannot recover the original value, but any combination with t or more secret shares can be completely reconstructed. The secure aggregation server then sends the N secret shares belonging to the same data source to N pre-deployed aggregation nodes, ensuring that each aggregation node receives only one secret share from all data sources in the same location, thus achieving secure distribution of all clipped and noisy gradients.

[0081] Each aggregation node performs element-wise addition on its own secret share of the same dimension in the encrypted state to obtain the encrypted aggregation gradient share of the current dimension.

[0082] The obtained encrypted aggregated gradient shares are sent to the secure aggregation server. After receiving the encrypted aggregated gradient shares uploaded by all aggregation nodes, the secure aggregation server performs an addition operation on all shares of the same dimension to obtain the gradient sum of the current dimension. Since the gradient of each data source has been divided into multiple secret shares and evenly distributed to each aggregation node, the addition operation of the shares of all aggregation nodes is exactly equal to the gradient sum of all data sources. The secure aggregation server divides the gradient sum by the total number of data sources participating in this round of training to obtain the plaintext average gradient of the current dimension. The addition and averaging operations are repeated for each dimension of the gradient vector to completely reconstruct the plaintext average gradient of all dimensions. After reconstruction, the secure aggregation server uses the plaintext average gradient to perform a standard gradient descent update on the current global regression model parameters, generates a new round of global regression model parameters, and broadcasts them to each data source.

[0083] Preferably, compared to existing methods that use differential privacy federated regression with uniform noise intensity or simply distribute the privacy budget equally across platforms, this method introduces adaptive Gaussian noise based on personalized privacy budgets in distributed training. This allows sensitive features and data points with significant platform contributions to automatically receive stronger privacy protection, while non-sensitive parts retain higher model utility. Combined with adaptive gradient pruning and secure aggregation through additive secret sharing, this method effectively controls the amplification of noise by extreme gradients and avoids the risk of gradient leakage. Finally, under the same total privacy budget constraint, it significantly improves the prediction accuracy and training stability of the global regression model, while achieving a more refined privacy-accuracy tradeoff and cross-platform fairness.

[0084] S4: Iteratively optimize the global regression model, and adjust the attention weight and privacy budget allocation in reverse based on the generalization error of local cross-validation on each platform to obtain the user privacy regression model and evaluation report.

[0085] S4.1: After the parameters of the new global regression model are broadcast to each data source in each round, each data source pauses gradient uploading and instead uses its locally held independent validation set (which has never participated in gradient calculation during the entire training process) to perform forward inference on the current new global regression model, calculates the mean squared error of all validation samples, and uses this mean squared error as the generalization error of this data source.

[0086] Each data source uploads its calculated generalization error to the security aggregation server. The security aggregation server performs a simple average of the generalization errors of all data sources to obtain the global error signal for the current round.

[0087] The global error signal continues to propagate backward in a differentiable manner: it flows through the backbone multilayer neural network of the current global regression model (keeping the parameters frozen and only used for gradient propagation), and then flows upward into the platform-level attention layer and feature-level attention layer of the hierarchical attention mechanism. This allows the learnable parameters of the platform attention weights and feature attention weights (i.e., the aforementioned learnable weight vector and global query vector) to obtain updated gradients and perform parameter updates based on the global error signal. At the same time, the global error signal continues to flow to the decay coefficient in the exponential decay function, allowing the decay coefficient to also obtain updated gradients and perform parameter adjustments based on the global error signal, thereby synchronously optimizing the attention weight and privacy budget allocation strategy.

[0088] S4.2: During training, the global error signal value is continuously recorded for each round. When the global error signal does not decrease for the most recent three consecutive rounds, the early stopping mechanism is immediately triggered, stopping all iterative training. Using three rounds instead of one or two rounds can effectively avoid premature termination of training due to random noise or single fluctuations. Compared to longer rounds, three rounds can stop training more promptly before the model begins to overfit significantly.

[0089] After training stops, the global regression model parameters of the current round are used as the output of the final converged user privacy regression model. At the same time, the mean squared error of the final user privacy regression model on the local test set of each data source, the difference in mean squared error between the final user privacy regression model and the benchmark model with the same structure but without any privacy protection measures on the same test set, the success rate of member inference attacks simulated by training the shadow model, and the total privacy budget actually consumed throughout the entire training process are statistically recorded and output together to form a complete evaluation report.

[0090] Preferably, unlike existing federated regression methods where attention weights and privacy budgets are fixed or adjusted only through fixed hyperparameters, this method backpropagates the global error signal to a hierarchical attention mechanism and an exponential decay function. This allows the platform attention weights, feature attention weights, and privacy budget allocation strategies to automatically and continuously optimize with the model's generalization performance, achieving a dynamic adaptive match between attention and privacy protection strength. Combined with an early stopping mechanism based on a local validation set, it avoids budget waste and overfitting caused by ineffective iterations, thereby achieving better generalization ability under the same total privacy budget. At the same time, the evaluation report comprehensively quantifies the accuracy gap and actual privacy consumption, improving the model's interpretability and reliability for cross-platform deployment.

[0091] This embodiment also provides a computer device applicable to the user privacy regression modeling method that integrates data from multiple platforms, including: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement the user privacy regression modeling method that integrates data from multiple platforms as proposed in the above embodiment.

[0092] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0093] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the user privacy regression modeling method for integrating multi-platform data as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0094] In summary, this invention overcomes the suboptimal utility-privacy trade-off caused by the uniform privacy budget allocation strategy commonly used in existing privacy modeling methods. This is achieved through a personalized privacy budget dynamic allocation mechanism combining a hierarchical attention mechanism and an exponential decay function, along with a global reallocation and cross-round reuse mechanism for the global privacy budget pool. Furthermore, by aggregating the local generalization errors of each platform into a global error signal and backpropagating it to the closed-loop optimization of the attention layer and exponential decay function, real-time coordinated adjustment of attention weights and privacy noise intensity is achieved. This fundamentally solves the shortcomings of existing methods where the attention mechanism is only used for feature representation and is independent of privacy budget allocation, failing to dynamically match data importance with protection strength.

[0095] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A user privacy regression modeling method integrating multi-platform data, characterized in that: include, User behavior data is collected from multiple data sources, and a hierarchical attention mechanism is used to preprocess the user behavior data to output a feature vector with uniform dimensions, specifically: User behavior data from different data sources are input into the platform-level attention layer. The global importance score of each platform is calculated and normalized using learnable weight vectors to obtain the platform attention weights. All features from each data source are input into the feature-level attention layer. The contribution score of each feature to the regression task is calculated and normalized to obtain the feature attention weights. The feature vectors of each platform are weighted and fused using platform attention weights, and then the features within each data source are weighted a second time using feature attention weights to obtain preliminary fused features. Batch standardization and median missing value imputation are performed on the preliminary fused features to output a feature vector with uniform dimensions; The user behavior data includes clickstream features, transaction record features, social interaction features, time series features, and device and environment features; Based on the attention weights generated by the hierarchical attention mechanism, a privacy budget is dynamically allocated using an exponential decay function, and a cross-platform balance is achieved through a global privacy budget pool to obtain a personalized privacy budget allocation scheme. Based on the feature vector and the personalized privacy budget allocation scheme, distributed regression training is performed locally on each data source. After adding noise that is adaptive to the privacy budget to the gradient, a global regression model is formed through secure aggregation. The global regression model is iteratively optimized, and the attention weights and privacy budget allocation are adjusted in reverse based on the generalization error of local cross-validation on each platform to obtain the user privacy regression model and evaluation report.

2. The user privacy regression modeling method integrating multi-platform data as described in claim 1, characterized in that: The attention weights generated based on the hierarchical attention mechanism are dynamically allocated to the privacy budget using an exponential decay function, and then balanced across platforms through a global privacy budget pool to obtain a personalized privacy budget allocation scheme, specifically: Arrange the platform attention weights in order of data source, then append the feature attention weights in each data source in order of feature to the platform attention weights, and concatenate them into a joint attention vector; Each element in the joint attention vector is mapped by applying exponential decay to obtain the corresponding local privacy budget, which is then aggregated into the global privacy budget pool. Based on the proportion of the remaining total budget in the global privacy budget pool, the privacy budgets of each platform are redistributed and adjusted to output a personalized privacy budget allocation scheme.

3. The user privacy regression modeling method integrating multi-platform data as described in claim 2, characterized in that: The process of performing distributed regression training locally on each data source and adding noise adaptive to the privacy budget to the gradient specifically involves: Before distributed regression training begins, a global regression model with the same structure as the local model of each data source is initialized. Each data source uses a multi-layer neural network and receives the current global regression model parameters as the initial parameters of its local regression model. Each data source uses local user behavior data and feature vectors to perform forward propagation and loss calculation to obtain local gradients. Based on the personalized privacy budget corresponding to each data point in the current batch, the noise intensity is determined, noise is added to the local gradients, and gradient clipping is performed to obtain clipped and noisy gradients.

4. The user privacy regression modeling method integrating multi-platform data as described in claim 3, characterized in that: The process of forming a global regression model through secure aggregation is as follows: The cropped and noisy gradients uploaded from each data source are divided into multiple secret shares and distributed to multiple aggregation nodes using a secret sharing method. Each aggregation node performs a secure summation operation in an encrypted state on its secret share of the same location to obtain the encrypted aggregation gradient share; Multiple aggregation nodes collaborate to reconstruct the plaintext average gradient corresponding to the encrypted aggregate gradient share, and update the parameters of the current global regression model to generate a new round of global regression model.

5. The user privacy regression modeling method integrating multi-platform data as described in claim 4, characterized in that: The iterative optimization of the global regression model, by adjusting the attention weights and privacy budget allocation in reverse based on the generalization error of local cross-validation on each platform, yields the user privacy regression model, specifically as follows: After each new round of global regression model is released, each platform uses the local validation set to calculate the regression mean square error as the generalization error, and summarizes the generalization errors of all platforms into a global error signal. By backpropagating the global error signal, the platform attention weights and feature attention weights in the hierarchical attention mechanism are updated synchronously, and the decay coefficient of the exponential decay function is also updated. Training stops when the global validation loss no longer decreases after N consecutive rounds, and the finally converged user privacy regression model is output.

6. The user privacy regression modeling method integrating multi-platform data as described in claim 5, characterized in that: The evaluation report includes the mean squared error of the finally converged user privacy regression model on test sets of various platforms, the accuracy gap with the model without privacy protection, the success rate of member inference attacks, and the total privacy budget consumed.

7. The user privacy regression modeling method integrating multi-platform data as described in claim 3, characterized in that: Adding noise to the local gradient and performing gradient clipping refers to performing L2 norm gradient clipping on the local gradient according to a preset gradient clipping threshold, and adding Gaussian noise of corresponding intensity to the clipped gradient.

8. The user privacy regression modeling method integrating multi-platform data as described in claim 7, characterized in that: The global privacy budget pool will redistribute any unused remaining budget to the next round according to the attention weight ratio of the current round after each iteration.

9. The user privacy regression modeling method integrating multi-platform data as described in claim 2, characterized in that: The process involves redistributing and adjusting the local privacy budgets across platforms based on the proportion of the remaining total budget in the global privacy budget pool, and outputting a personalized privacy budget allocation scheme. Specifically: The global redistribution coefficient is obtained by calculating the ratio of the current remaining budget in the global privacy budget pool to the sum of all local privacy budgets. The adjusted local privacy budget is obtained by calculating the global reallocation coefficient with each local privacy budget. Update the sum of all adjusted local privacy budgets to the current remaining total budget, and map the adjusted local privacy budgets back to each corresponding data point, outputting a personalized privacy budget allocation scheme.