Method and apparatus for privacy data protection in a communication network
By employing a privacy-preserving decision-making model based on dual deep Q-learning and Stackelberg game theory, a privacy budget is dynamically allocated, addressing the problem that static privacy budgets in 5G networks cannot adapt to network changes. This achieves a balance between privacy protection and network performance, improving network operating efficiency and security.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA TELECOM ARTIFICIAL INTELLIGENCE TECHNOLOGY (BEIJING) CO LTD
- Filing Date
- 2025-05-14
- Publication Date
- 2026-06-30
AI Technical Summary
Existing differential privacy practices in 5G networks suffer from compromised data accuracy and network optimization decision quality during high-traffic periods due to the inability to dynamically adjust static privacy budgets. Meanwhile, centralized privacy processing cannot meet low-latency requirements, impacting the efficiency of network optimization and security analysis.
A privacy protection decision model based on dual deep Q-learning algorithm and Stackelberg game theory is adopted to dynamically allocate the privacy budget. Through the collaborative optimization of edge nodes and central nodes, a balance between real-time privacy protection and network performance is achieved.
While protecting user privacy, it improves network operating efficiency and security, adapts to the changing needs of dynamic network environments, and meets the requirements for low-latency communication.
Smart Images

Figure CN120498757B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of network information security technology, and more specifically, to a method and apparatus for protecting privacy data in communication networks. Background Technology
[0002] Against the backdrop of the rapid development of mobile communications and the Internet of Things (IoT), differential privacy, as a powerful privacy protection technology, has been widely used in data sharing and analysis to ensure that personal data is protected from direct or indirect leakage during statistical analysis. However, most existing differential privacy practices rely on static privacy budget allocation strategies, which are gradually revealing their limitations in the highly dynamic 5G network environment, mainly in the following aspects:
[0003] (1) The contradiction between static privacy budget and dynamic network environment: Because 5G networks provide customized Quality of Service (QoS) for different types of services through network slicing technology, traditional differential privacy methods use a fixed privacy budget, which cannot be adjusted in real time according to changes in actual network load and user activity patterns. This means that during high-traffic periods, more noise needs to be injected to maintain the same level of privacy protection, which will seriously impair the accuracy and availability of data, thereby affecting the quality of network optimization decisions.
[0004] (2) Current differential privacy schemes tend to implement data privacy protection on a central server or cloud platform. While this facilitates centralized resource management and unified standards, it also introduces additional communication latency and computational burden. With the popularization of 5G networks and the surge in the number of IoT devices, a large amount of real-time data streams converge at the central node. Traditional centralized privacy processing mechanisms can no longer meet the requirements of millisecond-level low-latency communication, becoming a key factor restricting the efficiency of network optimization and security analysis.
[0005] In conclusion, static privacy budget allocation strategies reveal their lack of flexibility and adaptability when faced with the high speed, low latency, and massive connectivity characteristics of 5G networks. Therefore, how to ensure data privacy protection while maintaining data availability and network service efficiency has become a critical technical challenge that urgently needs to be addressed. Summary of the Invention
[0006] This application provides a method and apparatus for protecting privacy data in communication networks, which at least solves the technical problem that related technologies using static privacy budget allocation strategies are unable to meet the needs of dynamically changing network optimization and security analysis.
[0007] According to one aspect of the embodiments of this application, a privacy data protection method for communication networks is provided, comprising: acquiring network operation status information of the communication network within a preset time period, and acquiring user privacy datasets corresponding to target edge nodes within the communication network; analyzing the network operation status information and user privacy datasets using a pre-trained privacy protection decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements, wherein the privacy protection decision model is trained based on a dual deep Q-learning algorithm and Stackelberg game theory, and the target privacy budget allocation strategy includes at least: a first target privacy protection budget allocated to the target edge nodes and a second target privacy protection budget allocated to the center nodes; and performing privacy protection processing on the user privacy datasets according to the target privacy budget allocation strategy.
[0008] Optionally, the user privacy dataset corresponding to the target edge node within the communication network is obtained, including: identifying multiple users within the coverage area of the target edge node; for each user, obtaining the user's privacy dataset, wherein the privacy dataset includes at least: real-time location coordinates and device identifier; adding spatiotemporal labels to the real-time location coordinates using GeoHash encoding to obtain spatiotemporal location data; and forming the user privacy dataset corresponding to the target edge node from the privacy datasets processed by the multiple users within the coverage area of the target edge node.
[0009] Optionally, the training process of the privacy protection decision model includes: constructing an online Q-network and a target Q-network for solving the target privacy budget allocation strategy, and initializing the network weight parameters of the online Q-network and the target Q-network; setting an experience pool and determining its capacity; determining a first preset number of iteration cycles, and iteratively solving the privacy budget allocation strategy and network weight parameters through the following steps: in each iteration cycle, initializing the communication network environment state, and determining a second preset number of calculation cycles, wherein the communication network environment state includes at least: the user privacy dataset and network operation status information corresponding to the target edge node in the communication network; in each calculation cycle, determining the second initial privacy protection budget of the central node based on the current communication network environment state, and determining the first initial privacy protection budget of the target edge node in combination with the preset total privacy budget, and determining the initial privacy budget allocation strategy based on the first initial privacy protection budget and the second initial privacy protection budget; based on the initial privacy budget allocation strategy, using Stackelberg game theory to conduct multiple rounds of games to obtain multiple privacy budget allocation strategies; inputting the current communication network environment state into the online Q-network, and calculating each privacy budget allocation strategy. The budget allocation strategy is used to predict the Q-value, and a target privacy budget allocation strategy with the maximum predicted Q-value is selected based on a greedy strategy. This target privacy budget allocation strategy minimizes communication latency while meeting preset privacy protection requirements. Privacy processing is performed according to the target privacy budget allocation strategy to obtain the corresponding reward and new state of the communication network environment. The current state of the communication network environment, the target privacy budget allocation strategy, the reward, and the new state of the communication network environment are used as a sample and stored in an experience pool. Multiple randomly sampled samples from the experience pool are input into the neural network to calculate the probability of each sample being sampled, the mean squared error loss function, and the loss function weight. The target Q-value corresponding to the second target privacy protection budget is determined based on the calculation results. Based on the predicted Q-value and the target Q-value corresponding to the target privacy budget allocation strategy, the network weight parameters of the online Q-network are updated using gradient descent. After a third preset number of calculation cycles, the network weight parameters of the target Q-network are updated based on the network weight parameters of the online Q-network, where the third preset number is less than the second preset number. After iteration, the resulting target Q-network is used as the privacy protection decision model.
[0010] Optionally, the current communication network environment status is input into the online Q network to calculate the predicted Q value corresponding to various privacy budget allocation strategies, including: for each privacy budget allocation strategy, calculating the data utility corresponding to the privacy budget allocation strategy based on the user privacy dataset of the edge node, network operation status information, and privacy budget allocation strategy; and determining the predicted Q value corresponding to each privacy budget allocation strategy based on the long-term data utility value corresponding to each privacy budget allocation strategy.
[0011] Optionally, privacy protection processing is performed on the user privacy dataset according to the target privacy budget allocation strategy, including: dividing a preset time period into multiple sub-time periods according to a preset time window, and determining the spatiotemporal sensitivity of the coverage area of the target edge node in each sub-time period according to the following formula: Δ T =α·Δ base +β·log(1+N T / N avg ), where Δ T Δ represents the spatiotemporal sensitivity of the coverage area of the target edge node in sub-time period T. base N represents the preset spatiotemporal sensitivity baseline value. T N represents the number of concurrent users within the coverage area of the target edge node during sub-time period T. avg The average number of concurrent users in the coverage area of the target edge node within a preset time period is represented by α and β, which represent weight parameters, and α + β = 1. Initial privacy protection processing is performed on the user privacy dataset corresponding to the target edge node based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. This initial privacy protection processing includes at least: noise addition and desensitization. Secondary privacy protection processing is performed on the user privacy dataset corresponding to the target edge node based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. This secondary privacy protection processing includes at least: noise addition.
[0012] Optionally, initial privacy protection processing is performed on the user privacy dataset corresponding to the target edge node based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. This includes: for each sub-time period, determining the relationship between the spatiotemporal sensitivity within the sub-time period and the spatiotemporal sensitivity benchmark value; if the spatiotemporal sensitivity corresponding to the sub-time period is not less than the spatiotemporal sensitivity benchmark value, adding a first truncated Laplace noise to the spatiotemporal location data of each user under the coverage area of the target edge node according to the first target privacy protection budget, and desensitizing the device identifiers of each user to obtain pseudo-device identifiers; if the spatiotemporal sensitivity corresponding to the sub-time period is less than the spatiotemporal sensitivity benchmark value, adding a second truncated Laplace noise to the spatiotemporal location data of each user under the coverage area of the target edge node according to the first target privacy protection budget, and desensitizing the device identifiers of each user to obtain pseudo-device identifiers; wherein the distribution width of the first truncated Laplace noise is less than the distribution width of the second truncated Laplace noise.
[0013] Optionally, the device identifiers of each user are anonymized to obtain pseudo-device identifiers, including: for each user within the coverage area of the target edge node, determining a preset first random number, wherein the value of the first random number is between 0 and 1; and calculating the cumulative probability of different pseudo-device identifiers according to the following formula: In the formula, r represents the user's device identifier, o represents the device pseudo-identifier, and ε edge Let d(o,r) represent the first target privacy protection budget, d(o,r) represent the example measure of device pseudo-identifier and user device identifier, Z represent the normalization constant, and protocolCheck(o) represent the compliance check function used to perform compliance checks on device pseudo-identifier; the device pseudo-identifier corresponding to the cumulative probability greater than the first random number is used as the device pseudo-identifier after de-identification processing of the user's device identifier.
[0014] Optionally, based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, a secondary privacy protection process is performed on the user privacy dataset corresponding to the target edge node, including: for each user under the coverage area of the target edge node, the proportion λ of adding Gaussian noise to the user's privacy dataset is calculated according to the following formula: In the formula, γ represents the weight parameter, ACC represents the control parameter, and the value of ACC changes continuously with the convergence process of the privacy protection decision model. Specifically, the value of ACC in the early stage of convergence is smaller than the value in the later stage. The formula also determines the relationship between the proportion λ and a preset second random number, where the second random number ranges from 0 to 1. If the proportion λ is not less than the second random number, Gaussian noise is generated for each sub-time period based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and this Gaussian noise is added to the user's privacy dataset within the corresponding sub-time period. Finally, if the proportion λ is not less than the second random number, Laplace noise is generated for each sub-time period based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and this Laplace noise is added to the user's privacy dataset within the corresponding sub-time period.
[0015] According to another aspect of the embodiments of this application, a privacy data protection device for a communication network is also provided, comprising: an acquisition module, configured to acquire network operation status information of the communication network within a preset time period, and acquire user privacy datasets corresponding to target edge nodes within the communication network; a decision module, configured to analyze the network operation status information and user privacy datasets using a pre-trained privacy protection decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements, wherein the privacy protection decision model is trained based on a dual deep Q-learning algorithm and Stackelberg game theory, and the target privacy budget allocation strategy includes at least: a first target privacy protection budget allocated to the target edge nodes and a second target privacy protection budget allocated to the center nodes; and a privacy protection module, configured to perform privacy protection processing on the user privacy datasets according to the target privacy budget allocation strategy.
[0016] According to another aspect of the embodiments of this application, an electronic device is also provided, the electronic device including: a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above-described privacy data protection method for communication networks through the computer program.
[0017] In this embodiment, the system uses a privacy-preserving decision model pre-trained based on a dual-deep Q-learning algorithm and Stackelberg game theory to intelligently analyze the network operation status information of the communication network and the user privacy dataset of target edge nodes within the network. This analysis yields the optimal target privacy budget allocation strategy for privacy-preserving processing of the user privacy dataset. Therefore, this embodiment effectively balances privacy protection and network performance, achieving the goal of protecting user privacy while ensuring network operation efficiency and security. This solves the technical problem that related technologies employing static privacy budget allocation strategies, which struggle to meet the dynamically changing needs of network optimization and security analysis. Attached Figure Description
[0018] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0019] Figure 1 This is a flowchart illustrating an optional privacy data protection method for communication networks according to an embodiment of this application;
[0020] Figure 2 This is a schematic diagram of an optional privacy data protection device for a communication network according to an embodiment of this application;
[0021] Figure 3This is a schematic diagram of the structure of an optional electronic device according to an embodiment of this application. Detailed Implementation
[0022] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0023] It should be noted that the terms "first," "second," etc., used in the specification, claims, and drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0024] To better understand the embodiments of this application, the following is a translation and explanation of some nouns or terms that appear in the description of the embodiments of this application:
[0025] The truncated Laplace mechanism is a variant of the Laplace mechanism. It limits the range of input and output by designing a probability density function to ensure that the user's output range is consistent under different noise perturbations, satisfying poor privacy and bounded privacy loss.
[0026] Hash function: A function that transforms data of any size into data of a specific size. The transformed data is called a hash value or hash code.
[0027] GeoHash is a geocoding system that encodes geographic coordinates (longitude and latitude) into strings. It recursively divides the Earth's surface into smaller rectangular regions, each with a unique GeoHash code. The length of the GeoHash code directly affects the precision of the represented region; the longer the code, the smaller the represented region and the higher the precision.
[0028] Stackelberg games, also known as non-cooperative game models in game theory, involve players divided into two roles: leaders and followers. The leader makes the first decision, and the followers react after observing the leader's decision. Therefore, the leader can be seen as the initiator, considering the followers' reactions and formulating strategies accordingly, while the followers choose their optimal decision based on the leader's strategy.
[0029] Example 1
[0030] According to an embodiment of this application, a privacy data protection method for communication networks is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0031] Figure 1 This is a flowchart illustrating a privacy data protection method for communication networks according to an embodiment of this application. Figure 1 As shown, the method includes the following steps:
[0032] Step S102: Obtain the network operation status information of the communication network within a preset time period and the user privacy dataset corresponding to the target edge node in the communication network.
[0033] In the technical solution provided in step S102 above, the network operation status information is used to reflect the network monitoring status of the communication network, while the user privacy dataset includes the privacy datasets of multiple users within the coverage area of the target edge node (base station or gateway), such as the user's spatiotemporal location information, device fingerprint information, network signaling data, etc. These data together constitute a real-time status snapshot describing the network health status and user behavior patterns.
[0034] Step S104: Analyze the network operation status information and user privacy dataset using a pre-trained privacy protection decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements.
[0035] In the technical solution provided in step S104 above, the aforementioned privacy protection decision model employs an intelligent decision-making framework based on a dual-deep Q-learning algorithm and Stackelberg game theory. It comprehensively considers real-time network conditions and the security requirements of user data to generate a privacy budget allocation strategy that both protects privacy and optimizes network performance. This strategy includes at least a first target privacy protection budget allocated to specific target edge nodes and a second target privacy protection budget allocated to the central node, used to guide subsequent noise injection and data processing operations.
[0036] Step S106: Perform privacy protection processing on the user privacy dataset according to the target privacy budget allocation strategy.
[0037] In the technical solution provided in step S106 above, according to the target privacy budget allocation strategy, namely the first target privacy protection budget allocated to the target edge node and the second target privacy protection budget allocated to the center node, the corresponding privacy protection strategy is implemented at the target edge node and the center node, thereby achieving precise protection of the user privacy dataset.
[0038] Based on the scheme defined in steps S102 to S106 above, it can be understood that in this embodiment, the system uses a privacy-preserving decision model pre-trained with a dual-deep Q-learning algorithm and Stackelberg game theory to intelligently analyze the network operation status information of the communication network and the user privacy dataset of the target edge nodes within the network, thereby obtaining the optimal target privacy budget allocation strategy for privacy-preserving processing of the user privacy dataset. Therefore, this embodiment effectively balances privacy protection and network performance, achieving the goal of protecting user privacy while ensuring network operating efficiency and security.
[0039] The following section describes each step of the privacy data protection method for communication networks, using a specific implementation process as an example.
[0040] As an optional implementation, in the technical solution provided in step S102 above, the system can capture network operation status information of the communication network in real time through built-in sensors or external interfaces, including but not limited to: bandwidth utilization, queue waiting time, attack threat level, historical attack frequency, risk of sensitive field exposure, edge CPU utilization, central memory margin, storage I / O pressure, network privacy budget margin, data signal-to-noise ratio, spatiotemporal risk index, etc.
[0041] Simultaneously, the system can analyze the geographical distribution and base station configuration information of the communication network to determine the service area boundaries of the target edge node, and monitor and record the privacy datasets of all users within the service area of the target edge node, including but not limited to: real-time location coordinates (containing longitude, latitude, and real-time timestamp), device identifiers (such as Media Access Control MAC address, International Mobile Equipment Identity Code IMEI, IP address), and network signaling data (Reference Signal Received Power RSRP, Signal-to-Noise Ratio SINR, Channel Quality Indicator CQI, etc.). This data can be directly obtained from the user's terminal device or indirectly inferred through base station and network signaling. Furthermore, since there is a close correlation between user location, time, and network behavior in mobile communication scenarios, this application adds spatiotemporal labels to the real-time location coordinates of each user to facilitate subsequent analysis algorithms in exploring these correlations.
[0042] GeoHash encoding is used to discretize the real-time location coordinates of each user, and then combined with a second-level timestamp to form a (longitude, latitude, time) triplet as a spatiotemporal label for the location data. This transforms longitude, latitude, and real-time timestamp into a standardized spatiotemporal feature vector, enabling precise association of the spatial location and temporal order of data points in subsequent processing. Finally, the processed privacy datasets of multiple users within the coverage area of the target edge node are combined to form the user privacy dataset corresponding to the target edge node.
[0043] Specifically, the process of adding spatiotemporal labels to real-time location coordinates can be implemented by the following steps:
[0044] Step 1: Obtain input parameters (longitude lon, latitude lat, real-time timestamp);
[0045] Step 2: Divide the real-time timestamp into discrete time units according to a preset time granularity. For example, if the preset time granularity is 5 minutes (300 seconds), then the discrete time unit can be represented as: t = timestamp / 300;
[0046] Step 3: Apply Geohash encoding to map (longitude lon, latitude lat) to fixed-precision two-dimensional grid coordinates (x, y). The precision of the Geohash encoding can be set according to the actual application scenario, such as 6 bits, 8 bits, etc.
[0047] Step 3: Output the discrete-time unit encoding t and the two-dimensional grid coordinates (based on Geohash two-dimensional integer encoding), i.e., output (x, y, t).
[0048] For example, if a user is located at (116.3975°E, 39.9087°N) at 14:05:03 on 2023-08-01 (the corresponding real-time timestamp is 1690872303), then the real-time timestamp is divided into time segments with a time granularity of 5 minutes, and spatial segments are performed according to Geohash encoding with a precision of 6 bits. The final spatiotemporal coordinates are then output as (x = 12543, y = 28791, t = 5636241).
[0049] As an optional implementation, in the technical solution provided in step S104 above, the training process of the privacy protection decision model may include:
[0050] First, an online Q-network and a target Q-network (with identical network structures) are constructed to solve the target privacy budget allocation strategy, and the network weight parameters of the online Q-network and the target Q-network are initialized. The online Q-network evaluates the value of actions in the current state, while the target Q-network predicts the value of future states; their weights are initialized with the same value to ensure consistency in the starting point of training.
[0051] Next, an experience pool is set up and its capacity is determined. The experience pool is used to store sequence samples of past states-actions-rewards-new states, and the capacity of the experience pool is preset to control the stored sequence samples and ensure that the model can learn from sufficient experience.
[0052] Determine the first preset number of iteration cycles (i.e., the number of iterations to perform multiple loop optimizations on the policy), and iteratively solve the privacy budget allocation policy and network weight parameters through the following steps:
[0053] Within each iteration cycle, the communication network environment state is initialized, and a second preset number of calculation cycles is determined (i.e., the number of repetitions of real-time state evaluation and action selection in each iteration). The communication network environment state includes at least: the user privacy dataset and network operation status information corresponding to any edge node in the communication network.
[0054] Each calculation cycle includes the following steps:
[0055] Step 1: Determine the second initial privacy protection budget for the central node based on the current communication network environment, and determine the first initial privacy protection budget for the target edge node in combination with the preset total privacy budget. Determine the initial privacy budget allocation strategy (i.e., the starting point of the game) based on the first and second initial privacy protection budgets.
[0056] Step 2: Based on the initial privacy budget allocation strategy, Stackelberg game theory is used to conduct multiple rounds of games to obtain various privacy budget allocation strategies.
[0057] Step 3: Input the current communication network environment status into the online Q network and calculate the predicted Q value corresponding to each privacy budget allocation strategy.
[0058] Step 3: Select the target privacy budget allocation strategy with the maximum predicted Q value based on the greedy strategy. The target privacy budget allocation strategy is the privacy budget allocation strategy that minimizes communication latency and meets the preset privacy protection requirements.
[0059] Step 5: Perform corresponding privacy processing (such as noise injection, data obfuscation, etc.) according to the target privacy budget allocation strategy, and obtain corresponding rewards (such as reduction of privacy leakage risk, changes in network latency, etc.) and a new state of the communication network environment;
[0060] Step 6: Take the current communication network environment status, target privacy budget allocation strategy, rewards, and new communication network environment status as a sample, and then store the sample in the experience pool;
[0061] Step 7: Randomly sample multiple samples from the experience pool for learning by the online Q-network. The probability of each sample being sampled is related to its importance, which helps to focus on which experience fragments bring the greatest learning results. The neural network can then calculate the loss function for each sample, including the mean squared error loss function and loss function weights, to accurately reflect the contribution of the sample. Based on the calculation results, the target Q value corresponding to the second objective, the privacy protection budget, is then determined.
[0062] Step 8: Based on the predicted Q-value and target Q-value corresponding to the target privacy budget allocation strategy, update the network weight parameters of the online Q-network through backpropagation and gradient descent to optimize the model's predictive ability.
[0063] Then, after a third preset number of computation cycles (usually less than the second preset number), the system can update the network weight parameters of the target Q-network based on the network weight parameters of the online Q-network. This helps to smooth the prediction curve of the target network, prevent model overfitting, and improve stability. As training progresses, the model gradually learns how to optimally allocate the first privacy-preserving budget to edge nodes and the second privacy-preserving budget to center nodes, given a total privacy-preserving budget.
[0064] Finally, after iterative training, the resulting target Q-network is used as a privacy protection decision model to guide the formulation of privacy protection strategies in real-time network environments, ensuring that network performance is maximized while maintaining data security.
[0065] Specifically, the objective function of the Stackelberg game process described above can be written as:
[0066] maxεedge min εcenter [PrivacyRisk+η·CommDelay]
[0067] In the formula, PrivacyRisk represents the privacy risk value, and i represents all nodes participating in privacy protection within the system (including the central node and privacy nodes), and CommDelay represents the communication latency. QueueLength represents queue latency (the waiting time for privacy data before it is processed by edge nodes). Therefore, queue latency depends on factors such as network load, packet size, and the processing capacity of network devices. DataSize represents transmission latency (the time required for privacy data to be transmitted from the terminal to the central node). Transmission latency is affected by the characteristics of the transmission medium (such as the speed of electromagnetic wave propagation and fiber optic transmission speed), signal interference, channel conditions (such as bandwidth and signal strength), and the size of the data packet itself. Additionally, η, ω represents all weight parameters.
[0068] The Stackelberg game process described above will be briefly explained in the following two rounds of the game.
[0069] First, initialize the communication network environment state and the total network privacy budget ε. total Network load = 0.82 (high), attack threat = 2 (high risk), remaining network privacy budget = 0.6, model convergence = 0.15, ε total =2;
[0070] In the first round of the game, the central node (i.e., the leader) first selects an action (the second initial privacy budget) based on the current state of the communication network environment, such as ε. center =Remaining network privacy budget - 0.1 = 0.6, at this point, update the remaining network privacy budget = 2 - 0.6 = 1.4; then, according to constraint ε edge ≤1.4, edge node (i.e., follower) selection action (first initial privacy budget), i.e., ε edge =1.2, used to maximize data utility; then, based on the aforementioned privacy budget
[0071] In the second round of the game, the central node (i.e., the leader) first selects an action (the second initial privacy budget) based on the current state of the communication network environment, such as ε. center = Remaining network privacy budget + 0.1 = 0.7. At this point, update the remaining network privacy budget to 2 - 0.7 = 1.3; then, according to constraint ε... edge≤1.3, edge node (i.e., follower) selection action (first initial privacy budget), i.e., ε edge =1.0, used to maximize data utility; then, based on the aforementioned privacy budget
[0072] Compared to the first game round, the immediate reward is significantly improved in the second game round. Therefore, by conducting multiple rounds of the game in this manner, a target privacy budget allocation strategy that minimizes communication latency and meets the preset privacy protection requirements can be obtained.
[0073] Optionally, the system calculates the predicted Q-value for each privacy budget allocation strategy in the following manner:
[0074] Step 1: For each privacy budget allocation strategy, calculate the data utility corresponding to the privacy budget allocation strategy based on the user privacy dataset of the edge node, the network operation status information, and the privacy budget allocation strategy.
[0075] Data utility can quantify the impact of privacy protection measures on data availability, network performance, and user service experience. Therefore, when calculating the data utility corresponding to each privacy budget allocation strategy, this application embodiment can comprehensively reflect the utility value of data in different dimensions through the following utility evaluation indicators, including:
[0076] (1) Data accuracy: By comparing the data values before and after processing, the extent to which the strategy causes loss of data authenticity is evaluated.
[0077] (2) Data integrity: Check whether the data is complete and intact, including missing values, missing data fragments, etc.
[0078] (3) Data consistency: Evaluate the correlation or consistency of processed data at different points in time or between different data streams.
[0079] (4) Network performance indicators: such as QoS indicators, including latency, throughput, packet loss rate, etc., which reflect the impact of policies on network service quality.
[0080] (5) User behavior prediction accuracy: The accuracy of predicting user behavior patterns is evaluated by analyzing the processed data.
[0081] Then, time series analysis, regression models, or deep learning algorithms are used to make long-term predictions of the utility of the simulated data. The predicted utility indicators are converted into numerical values through normalization or standardization methods. The prediction results of different utility indicators are then merged into an overall long-term data utility value through weighted averaging or comprehensive scoring. In the weighting process, the weights corresponding to the prediction results of different utility indicators should be determined according to the specific application scenario and importance. This application does not impose specific restrictions on this.
[0082] Step 2: Determine the predicted Q-values for each privacy budget allocation strategy based on its long-term data utility. The predicted Q-values reflect the long-term expected benefits of adopting a particular privacy budget allocation strategy under the current communication network environment.
[0083] In other words, the system maps the long-term data utility value corresponding to each privacy budget allocation strategy to a Q-value, ensuring that the model focuses not only on immediate rewards during the learning process but also, and more importantly, on the long-term effects of the strategies. Furthermore, by comparing the long-term data utility and predicted Q-values under different privacy budget allocation strategies, the privacy-preserving decision-making model can select strategies that minimize the impact on data utility and network performance while protecting privacy. This process is repeated, allowing the model's predictive ability to improve through comparison with actual long-term data utility. Simultaneously, the privacy budget allocation strategy is dynamically adjusted and optimized through this cycle, ensuring a continuous balance between privacy protection and data utility.
[0084] Therefore, the above model training process establishes policy interactions between central and edge nodes through dynamic game theory, and solves for the optimal privacy parameter configuration through deep reinforcement learning to respond in real time to changes in network state across multiple dimensions, such as network load and attack threats. Compared to static federated learning frameworks, this mechanism can more finely control the global privacy budget, enabling intelligent adjustment of the privacy protection budget under network operating conditions. This ensures that the system can find the optimal balance between privacy protection and network performance in a constantly changing environment, improving the security and efficiency of data processing.
[0085] Furthermore, the system can call the privacy protection decision model that has been trained to analyze the network operation status information and user privacy dataset to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements.
[0086] As an optional implementation, in the technical solution provided in step S106 above, the system can perform privacy protection processing on the user privacy dataset corresponding to the target edge node using the following method:
[0087] Step S1061: Divide the preset time period into multiple sub-time periods according to the preset time window, and determine the spatiotemporal sensitivity of the coverage area of the target edge node in each sub-time period according to the following formula:
[0088] Δ T =α·Δ base +β·log(1+N T / N avg )
[0089] In the formula, Δ T This represents the spatiotemporal sensitivity of the coverage area of the target edge node within the sub-time period T (where Δ) T The larger the value, the more accurately it reflects the data sensitivity under different time and space conditions, based on changes in user distribution density and network conditions. base This represents the preset spatiotemporal sensitivity baseline value (e.g., set to 1, which is obtained based on historical data statistics), N T This represents the number of concurrent users within the coverage area of the target edge node during sub-time period T (obtained through real-time statistics to reflect user density within sub-time period T, and N...). T The larger the size, the stronger the data correlation, and the higher the privacy risk. avg The number of concurrent users in the coverage area of the target edge node within a preset time period is represented (it is obtained through time window statistics and is used to reflect the average user density in the same sub-time period in the past). α and β represent weight parameters (used to control the balance between static benchmark and dynamic adjustment), and α+β=1.
[0090] Through the above time aggregation processing, massive real-time data streams can be segmented and processed to output aggregation results in a timely manner, thereby compressing the data size and reducing storage and transmission overhead.
[0091] Step S1062: Perform initial privacy protection processing on the user privacy dataset corresponding to the target edge node based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. The initial privacy protection processing includes at least: noise addition processing and desensitization processing.
[0092] Step S1063: Perform secondary privacy protection processing on the user privacy dataset corresponding to the target edge node based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. The secondary privacy protection processing includes at least: noise addition processing.
[0093] Optionally, in the technical solution provided in step S1062 above, the initial privacy protection operation includes:
[0094] For each sub-time period, determine the relationship between the spatiotemporal sensitivity within the sub-time period and the spatiotemporal sensitivity baseline value;
[0095] If the spatiotemporal sensitivity corresponding to the sub-time period is not less than the spatiotemporal sensitivity benchmark value, add a first amount of truncated Laplace noise to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and perform desensitization processing on the device identifier of each user to obtain the device pseudo-identifier.
[0096] If the spatiotemporal sensitivity corresponding to the sub-time period is less than the spatiotemporal sensitivity benchmark value, a second amount of truncated Laplace noise is added to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and the device identifier of each user is desensitized to obtain the device pseudo-identifier.
[0097] The distribution width of the first truncated Laplace noise is smaller than that of the second truncated Laplace noise. This is because when the number of concurrent users in the coverage area of the target edge node is high (i.e., the area is densely populated), the spatiotemporal sensitivity is high, the noise scale is small, and therefore relatively less noise is added to reduce data distortion. Conversely, when the number of concurrent users in the coverage area of the target edge node is low (i.e., the area is sparsely populated), the spatiotemporal sensitivity is low, the noise scale is large, and more noise is added to enhance privacy protection. This noise addition strategy based on real-time spatiotemporal sensitivity adjustment ensures reduced utility loss in data-dense areas and enhanced privacy protection in sparse areas. It overcomes the problem of excessive utility loss caused by fixed sensitivity in existing technologies, improves the availability of location coordinate information, and makes the privacy protection strategy more accurate.
[0098] Specifically, the process of adding truncated Laplace noise is as follows: First, the system determines the coordinate offset error δ of the differential privacy protection mechanism based on the application scenario requirements and privacy protection needs. Then, based on the first target privacy protection budget allocated to the target edge node and the spatiotemporal sensitivity corresponding to each sub-time period, the system calculates the scale parameter of the truncated Laplace noise for each sub-time period, where this scale parameter determines the noise distribution width. Next, the amount of truncated Laplace noise is determined based on the amount of data in the user privacy dataset within each sub-time period, and the resulting truncated Laplace noise is added to the original data to obtain the blurred data, thus ensuring that each piece of data receives appropriate privacy protection. For example, for the original coordinates (lat, lon), the blurred coordinates are (lat + noise). lat ,lon+noise lon ), where noise lat and noise lonIt is generated based on a truncated Laplace distribution. Finally, to prevent the offset error of the blurred location coordinates from exceeding the coordinate offset error δ due to the addition of noise, the location coordinates after adding noise can be restricted to adjust them to the nearest valid coordinate point. This ensures that the offset error of the blurred location coordinates does not exceed the coordinate offset error δ, thereby greatly reducing geographic distortion. Furthermore, the size of the blurred data is smaller, which can significantly reduce the cost of data transmission.
[0099] In addition, the specific process of desensitizing device identifiers described above is as follows:
[0100] First, for each user within the coverage area of the target edge node, a preset first random number is determined, wherein the value of the first random number is between 0 and 1.
[0101] Next, calculate the cumulative probability of different device pseudo-identifiers using the following formula:
[0102]
[0103] In the formula, r represents the user's device identifier, o represents the device pseudo-identifier, and ε edge The first target privacy protection budget is represented, d(o,r) represents the example measure of device pseudo-identifier and user device identifier, Z represents the normalization constant, and protocolCheck(o) represents the compliance check function, which is used to perform compliance checks on device pseudo-identifier. If it complies, the output is the first valid value (e.g., 1), and if it does not comply, the output is the second valid value (e.g., 0).
[0104] Finally, the device pseudo-identifier with a cumulative probability greater than the first random number is used as the device pseudo-identifier after de-identifying the user's device identifier.
[0105] For example, if a user's device identifier is their IP address, such as 192.168.1.105, when de-identifying this IP address, the protocolCheck constraints are: (1) it must conform to the internal network IP format 192.168.xx; (2) the last two segments must be integers from 1 to 255. Then, following the above method, the following compliant candidate device pseudo-identifier sets can be obtained: 192.168.45.231, 192.168.89.12, and 192.168.203.78. Finally, one can be selected from the compliant candidate device pseudo-identifier set as the de-identified pseudo-IP address. This ensures that the de-identified device pseudo-identifier still complies with 3GPP and other communication standards, supporting secure processing of sensitive information such as MAC addresses and IMSI. This not only protects data privacy but also ensures seamless data transmission and processing under different network protocols, a feature generally lacking in existing technologies.
[0106] Optionally, in the technical solution provided in step S1063 above, the secondary privacy protection operation includes:
[0107] Step 1: For each user within the target edge node's coverage area, calculate the proportion λ of adding Gaussian noise to the user's privacy dataset using the following formula:
[0108]
[0109] In the formula, γ represents the weight parameter, ACC represents the control parameter, and the value of ACC changes continuously during the convergence process of the privacy-preserving decision model. Specifically, the value of ACC in the early stage of convergence of the privacy-preserving decision model is smaller than that in the later stage of convergence. Therefore, in the early stage of model training, using Laplace noise (which has a steeper distribution and causes greater data disturbance) can quickly introduce sufficient randomness, prompting the federated learning model to converge faster and find a preliminary global optimum. As the model training progresses, adding Gaussian noise (which has a smoother distribution and causes less data disturbance, but can maintain higher data utility) in the later stage of convergence helps to improve the accuracy of the model and reduce the performance degradation caused by excessive noise.
[0110] Determine the relationship between the proportion λ and the preset second random number, where the value of the second random number is between 0 and 1.
[0111] If the proportion λ is not less than the second random number, Gaussian noise corresponding to each sub-time period is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and the Gaussian noise corresponding to each sub-time period is added to the user's privacy dataset in the corresponding sub-time period.
[0112] With the proportion λ not less than the second random number, Laplace noise corresponding to each sub-time period is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and the Laplace noise corresponding to each sub-time period is added to the user's privacy dataset in the corresponding sub-time period.
[0113] In the aforementioned secondary privacy protection process, Laplace noise is considered to have a strong defensive effect against statistical attacks (such as differential attacks), protecting the basic distribution characteristics of data from being exploited by attackers; while Gaussian noise helps prevent model-based reverse inference attacks. Therefore, under high network load, increasing the proportion of Laplace noise can provide stronger privacy protection, preventing sensitive data from being intercepted by potential attackers due to network congestion; while under good network conditions, the proportion of Gaussian noise can be appropriately increased to reduce the degree of data obfuscation, thereby reducing communication latency between central and edge nodes and improving real-time processing capabilities. In other words, by dynamically adjusting the ratio of Gaussian and Laplace noise, this embodiment can adapt to various unforeseen changes in network scenarios, such as sudden data leakage risks and abnormal traffic, and enhance the system's resilience by adjusting the noise type and parameters in real time.
[0114] Through the above steps, it can be seen that the embodiments of this application propose a dynamically layered privacy protection model, which can adaptively adjust the privacy protection strength of edge nodes and central nodes according to the real-time state of the network and the spatiotemporal characteristics of user distribution. This mechanism effectively resolves the contradiction between privacy protection and network performance optimization in the prior art, and improves the efficiency of privacy protection and data utility.
[0115] Example 2
[0116] According to an embodiment of this application, a privacy data protection device for communication networks is also provided for implementing the privacy data protection method for communication networks in Embodiment 1, such as... Figure 2 As shown, the privacy data protection device for communication networks includes at least: an acquisition module 22, a decision module 24, and a privacy protection module 26, wherein:
[0117] The acquisition module 22 is used to acquire network operation status information of the communication network within a preset time period, and to acquire user privacy datasets corresponding to target edge nodes within the communication network.
[0118] Decision module 24 is used to analyze network operation status information and user privacy datasets using a pre-trained privacy protection decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements. The privacy protection decision model is trained based on a dual deep Q-learning algorithm and Stackelberg game theory. The target privacy budget allocation strategy includes at least: a first target privacy protection budget allocated to the target edge nodes and a second target privacy protection budget allocated to the center nodes.
[0119] Privacy protection module 26 is used to perform privacy protection processing on user privacy datasets according to the target privacy budget allocation strategy.
[0120] The following section describes the functions of each module in the privacy data protection device for communication networks, using a specific implementation process as an example.
[0121] As an optional implementation, the acquisition module 22 can capture network operation status information of the communication network in real time through built-in sensors or external interfaces, including but not limited to: bandwidth utilization, queue waiting time, attack threat level, historical attack frequency, risk of sensitive field exposure, edge CPU utilization, central memory margin, storage I / O pressure, network privacy budget margin, data signal-to-noise ratio, spatiotemporal risk index, etc.
[0122] Simultaneously, the acquisition module 22 can also analyze the geographical distribution and base station configuration information of the communication network, determine the service area boundary of the target edge node, and monitor and record the privacy dataset of all users within the service area of the target edge node, including but not limited to: real-time location coordinates (containing longitude, latitude, and real-time timestamp), device identifiers (such as Media Access Control MAC address, International Mobile Equipment Identity Code IMEI, IP address), and network signaling data (Reference Signal Received Power RSRP, Signal-to-Noise Ratio SINR, Channel Quality Indicator CQI, etc.). This data can be directly obtained from the user's terminal device or indirectly inferred through base station and network signaling. Furthermore, since there is a close correlation between user location, time, and network behavior in mobile communication scenarios, this application adds spatiotemporal labels to the real-time location coordinates of each user to facilitate subsequent analysis algorithms in exploring these correlations.
[0123] GeoHash encoding is used to discretize the real-time location coordinates of each user, and then combined with a second-level timestamp to form a triple of {longitude, latitude, time}, which serves as the spatiotemporal label for the location data. This transforms longitude, latitude, and real-time timestamp into a standardized spatiotemporal feature vector, enabling precise association of the spatial location and temporal order of data points in subsequent processing. Finally, the processed privacy datasets of multiple users within the coverage area of the target edge node are combined to form the user privacy dataset corresponding to the target edge node.
[0124] Specifically, the process of adding spatiotemporal labels to real-time location coordinates can be implemented by the following steps:
[0125] Step 1: Obtain input parameters (longitude lon, latitude lat, real-time timestamp);
[0126] Step 2: Divide the real-time timestamp into discrete time units according to a preset time granularity. For example, if the preset time granularity is 5 minutes (300 seconds), then the discrete time unit can be represented as: t = timestamp / 300;
[0127] Step 3: Apply Geohash encoding to map (longitude lon, latitude lat) to fixed-precision two-dimensional grid coordinates (x, y). The precision of the Geohash encoding can be set according to the actual application scenario, such as 6 bits, 8 bits, etc.
[0128] Step 3: Output the discrete-time unit encoding t and the two-dimensional grid coordinates (based on Geohash two-dimensional integer encoding), i.e., output (x, y, t).
[0129] As an optional implementation, the privacy data protection device for communication networks in this application embodiment further includes a model training module, and the model training module can train a privacy protection decision model according to the following steps:
[0130] First, an online Q-network and a target Q-network (with identical network structures) are constructed to solve the target privacy budget allocation strategy, and the network weight parameters of the online Q-network and the target Q-network are initialized. The online Q-network evaluates the value of actions in the current state, while the target Q-network predicts the value of future states; their weights are initialized with the same value to ensure consistency in the starting point of training.
[0131] Next, an experience pool is set up and its capacity is determined. The experience pool is used to store past state-action-reward-new state (SARS) sequence samples, and the capacity of the experience pool is preset to control the stored sequence samples and ensure that the model can learn from sufficient experience.
[0132] Determine the first preset number of iteration cycles (i.e., the number of iterations to perform multiple loop optimizations on the policy), and iteratively solve the privacy budget allocation policy and network weight parameters through the following steps:
[0133] Within each iteration cycle, the communication network environment state is initialized, and a second preset number of calculation cycles is determined (i.e., the number of repetitions of real-time state evaluation and action selection in each iteration). The communication network environment state includes at least: the user privacy dataset and network operation status information corresponding to any edge node in the communication network.
[0134] Each calculation cycle includes the following steps:
[0135] Step 1: Determine the second initial privacy protection budget for the central node based on the current communication network environment, and determine the first initial privacy protection budget for the target edge node in combination with the preset total privacy budget. Determine the initial privacy budget allocation strategy (i.e., the starting point of the game) based on the first and second initial privacy protection budgets.
[0136] Step 2: Based on the initial privacy budget allocation strategy, Stackelberg game theory is used to conduct multiple rounds of games to obtain various privacy budget allocation strategies.
[0137] Step 3: Input the current communication network environment status into the online Q network and calculate the predicted Q value corresponding to each privacy budget allocation strategy.
[0138] Step 3: Select the target privacy budget allocation strategy with the maximum predicted Q value based on the greedy strategy. The target privacy budget allocation strategy is the privacy budget allocation strategy that minimizes communication latency and meets the preset privacy protection requirements.
[0139] Step 5: Perform corresponding privacy processing (such as noise injection, data obfuscation, etc.) according to the target privacy budget allocation strategy, and obtain corresponding rewards (such as reduction of privacy leakage risk, changes in network latency, etc.) and a new state of the communication network environment;
[0140] Step 6: Take the current communication network environment status, target privacy budget allocation strategy, rewards, and new communication network environment status as a sample, and then store the sample in the experience pool;
[0141] Step 7: Randomly sample multiple samples from the experience pool for learning by the online Q-network. The probability of each sample being sampled is related to its importance, which helps to focus on which experience fragments bring the greatest learning results. The neural network can then calculate the loss function for each sample, including the mean squared error loss function and loss function weights, to accurately reflect the contribution of the sample. Based on the calculation results, the target Q value corresponding to the second objective, the privacy protection budget, is then determined.
[0142] Step 8: Based on the predicted Q-value and target Q-value corresponding to the target privacy budget allocation strategy, update the network weight parameters of the online Q-network through backpropagation and gradient descent to optimize the model's predictive ability.
[0143] Then, after a third preset number of computation cycles (usually less than the second preset number), the system can update the network weight parameters of the target Q-network based on the network weight parameters of the online Q-network. This helps to smooth the prediction curve of the target network, prevent model overfitting, and improve stability. As training progresses, the model gradually learns how to optimally allocate the first privacy-preserving budget to edge nodes and the second privacy-preserving budget to center nodes, given a total privacy-preserving budget.
[0144] Finally, after iterative training, the resulting target Q-network is used as a privacy protection decision model to guide the formulation of privacy protection strategies in real-time network environments, ensuring that network performance is maximized while maintaining data security.
[0145] Specifically, the objective function of the Stackelberg game process described above can be written as:
[0146] max εedge min εcenter [PrivacyRisk+η·CommDelay]
[0147] In the formula, PrivacyRisk represents the privacy risk value, and i represents all nodes participating in privacy protection within the system (including the central node and privacy nodes), and CommDelay represents the communication latency. QueueLength represents queue latency (the waiting time for privacy data before it is processed by edge nodes). Therefore, queue latency depends on factors such as network load, packet size, and the processing capacity of network devices. DataSize represents transmission latency (the time required for privacy data to be transmitted from the terminal to the central node). Transmission latency is affected by the characteristics of the transmission medium (such as the speed of electromagnetic wave propagation and fiber optic transmission speed), signal interference, channel conditions (such as bandwidth and signal strength), and the size of the data packet itself. Additionally, η, ω represents all weight parameters.
[0148] Optionally, the model training module calculates the predicted Q-value corresponding to each privacy budget allocation strategy in the following way:
[0149] Step 1: For each privacy budget allocation strategy, calculate the data utility corresponding to the privacy budget allocation strategy based on the user privacy dataset of the edge node, the network operation status information, and the privacy budget allocation strategy.
[0150] Data utility quantifies the impact of privacy protection measures on data availability, network performance, and user service experience. Therefore, when calculating the data utility corresponding to each privacy budget allocation strategy, the model training module can comprehensively reflect the utility value of data across different dimensions using the following utility evaluation metrics:
[0151] (1) Data accuracy: By comparing the data values before and after processing, the extent to which the strategy causes loss of data authenticity is evaluated.
[0152] (2) Data integrity: Check whether the data is complete and intact, including missing values, missing data fragments, etc.
[0153] (3) Data consistency: Evaluate the correlation or consistency of processed data at different points in time or between different data streams.
[0154] (4) Network performance indicators: such as QoS indicators, including latency, throughput, packet loss rate, etc., which reflect the impact of policies on network service quality.
[0155] (5) User behavior prediction accuracy: The accuracy of predicting user behavior patterns is evaluated by analyzing the processed data.
[0156] Then, time series analysis, regression models, or deep learning algorithms are used to make long-term predictions of the utility of the simulated data. The predicted utility indicators are converted into numerical values through normalization or standardization methods. The prediction results of different utility indicators are then merged into an overall long-term data utility value through weighted averaging or comprehensive scoring. In the weighting process, the weights corresponding to the prediction results of different utility indicators should be determined according to the specific application scenario and importance. This application does not impose specific restrictions on this.
[0157] Step 2: Determine the predicted Q-values for each privacy budget allocation strategy based on its long-term data utility. The predicted Q-values reflect the long-term expected benefits of adopting a particular privacy budget allocation strategy under the current communication network environment.
[0158] In other words, the model training module maps the long-term data utility value corresponding to each privacy budget allocation strategy to a Q-value, ensuring that the model focuses not only on immediate rewards during the learning process but also, and more importantly, on the long-term effects of the strategy. Furthermore, by comparing the long-term data utility and predicted Q-value under different privacy budget allocation strategies, the privacy-preserving decision-making model can select strategies that minimize the impact on data utility and network performance while protecting privacy. This process is repeated, allowing the model's predictive ability to improve through comparison with actual long-term data utility. Simultaneously, the privacy budget allocation strategy is dynamically adjusted and optimized through this cycle, ensuring a continuous balance between privacy protection and data utility.
[0159] Furthermore, the decision module 24 can call the privacy protection decision model that has been trained above to analyze the network operation status information and user privacy dataset to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements.
[0160] As an optional implementation, the privacy protection module 26 can perform privacy protection processing on the user privacy dataset corresponding to the target edge node using the following methods:
[0161] Step S1061: Divide the preset time period into multiple sub-time periods according to the preset time window, and determine the spatiotemporal sensitivity of the coverage area of the target edge node in each sub-time period according to the following formula:
[0162] ΔT =α·Δ base +β·log(1+N T / N avg )
[0163] In the formula, Δ T This represents the spatiotemporal sensitivity of the coverage area of the target edge node within the sub-time period T (where Δ) T The larger the value, the more accurately it reflects the data sensitivity under different time and space conditions, based on changes in user distribution density and network conditions. base This represents the preset spatiotemporal sensitivity baseline value (e.g., set to 1, which is obtained based on historical data statistics), N T This represents the number of concurrent users within the coverage area of the target edge node during sub-time period T (obtained through real-time statistics to reflect user density within sub-time period T, and N...). T The larger the size, the stronger the data correlation, and the higher the privacy risk. avg The number of concurrent users in the coverage area of the target edge node within a preset time period is represented (it is obtained through time window statistics and is used to reflect the average user density in the same sub-time period in the past). α and β represent weight parameters (used to control the balance between static benchmark and dynamic adjustment), and α+β=1.
[0164] Through the above time aggregation processing, massive real-time data streams can be segmented and processed to output aggregation results in a timely manner, thereby compressing the data size and reducing storage and transmission overhead.
[0165] Step S1062: Perform initial privacy protection processing on the user privacy dataset corresponding to the target edge node based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. The initial privacy protection processing includes at least: noise addition processing and desensitization processing.
[0166] Step S1063: Perform secondary privacy protection processing on the user privacy dataset corresponding to the target edge node based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period. The secondary privacy protection processing includes at least: noise addition processing.
[0167] Optionally, in the technical solution provided in step S1062 above, the initial privacy protection operation includes:
[0168] For each sub-time period, determine the relationship between the spatiotemporal sensitivity within the sub-time period and the spatiotemporal sensitivity baseline value;
[0169] If the spatiotemporal sensitivity corresponding to the sub-time period is not less than the spatiotemporal sensitivity benchmark value, add a first amount of truncated Laplace noise to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and perform desensitization processing on the device identifier of each user to obtain the device pseudo-identifier.
[0170] If the spatiotemporal sensitivity corresponding to the sub-time period is less than the spatiotemporal sensitivity benchmark value, a second amount of truncated Laplace noise is added to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and the device identifier of each user is desensitized to obtain the device pseudo-identifier.
[0171] The distribution width of the first truncated Laplace noise is smaller than that of the second truncated Laplace noise. This is because when the number of concurrent users in the coverage area of the target edge node is high (i.e., the area is densely populated), the spatiotemporal sensitivity is high, the noise scale is small, and therefore relatively less noise is added to reduce data distortion. Conversely, when the number of concurrent users in the coverage area of the target edge node is low (i.e., the area is sparsely populated), the spatiotemporal sensitivity is low, the noise scale is large, and more noise is added to enhance privacy protection. This noise addition strategy based on real-time spatiotemporal sensitivity adjustment ensures reduced utility loss in data-dense areas and enhanced privacy protection in sparse areas. It overcomes the problem of excessive utility loss caused by fixed sensitivity in existing technologies, improves the availability of location coordinate information, and makes the privacy protection strategy more accurate.
[0172] Specifically, the process of adding truncated Laplace noise is as follows: First, the system determines the coordinate offset error δ of the differential privacy protection mechanism based on the application scenario requirements and privacy protection needs. Then, based on the first target privacy protection budget allocated to the target edge node and the spatiotemporal sensitivity corresponding to each sub-time period, the system calculates the scale parameter of the truncated Laplace noise for each sub-time period, where this scale parameter determines the noise distribution width. Next, the amount of truncated Laplace noise is determined based on the amount of data in the user privacy dataset within each sub-time period, and the resulting truncated Laplace noise is added to the original data to obtain the blurred data, thus ensuring that each piece of data receives appropriate privacy protection. For example, for the original coordinates (lat, lon), the blurred coordinates are (lat + noise). lat ,lon+noise lon ), where noise lat and noise lonIt is generated based on a truncated Laplace distribution. Finally, to prevent the offset error of the blurred location coordinates from exceeding the coordinate offset error δ due to the addition of noise, the location coordinates after adding noise can be restricted to adjust them to the nearest valid coordinate point. This ensures that the offset error of the blurred location coordinates does not exceed the coordinate offset error δ, thereby greatly reducing geographic distortion. Furthermore, the size of the blurred data is smaller, which can significantly reduce the cost of data transmission.
[0173] In addition, the specific process of desensitizing device identifiers described above is as follows:
[0174] First, for each user within the coverage area of the target edge node, a preset first random number is determined, wherein the value of the first random number is between 0 and 1.
[0175] Next, calculate the cumulative probability of different device pseudo-identifiers using the following formula:
[0176]
[0177] In the formula, r represents the user's device identifier, o represents the device pseudo-identifier, and ε edge The first target privacy protection budget is represented, d(o,r) represents the example measure of device pseudo-identifier and user device identifier, Z represents the normalization constant, and protocolCheck(o) represents the compliance check function, which is used to perform compliance checks on device pseudo-identifier. If it complies, the output is the first valid value (e.g., 1), and if it does not comply, the output is the second valid value (e.g., 0).
[0178] Finally, the device pseudo-identifier with a cumulative probability greater than the first random number is used as the device pseudo-identifier after de-identifying the user's device identifier.
[0179] Optionally, in the technical solution provided in step S1063 above, the secondary privacy protection operation includes:
[0180] Step 1: For each user within the target edge node's coverage area, calculate the proportion λ of adding Gaussian noise to the user's privacy dataset using the following formula:
[0181]
[0182] In the formula, γ represents the weight parameter, ACC represents the control parameter, and the value of ACC changes continuously during the convergence process of the privacy-preserving decision model. Specifically, the value of ACC in the early stage of convergence of the privacy-preserving decision model is smaller than that in the later stage of convergence. Therefore, in the early stage of model training, using Laplace noise (which has a steeper distribution and causes greater data disturbance) can quickly introduce sufficient randomness, prompting the federated learning model to converge faster and find a preliminary global optimum. As the model training progresses, adding Gaussian noise (which has a smoother distribution and causes less data disturbance, but can maintain higher data utility) in the later stage of convergence helps to improve the accuracy of the model and reduce the performance degradation caused by excessive noise.
[0183] Determine the relationship between the proportion λ and the preset second random number, where the value of the second random number is between 0 and 1.
[0184] If the proportion λ is not less than the second random number, Gaussian noise corresponding to each sub-time period is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and the Gaussian noise corresponding to each sub-time period is added to the user's privacy dataset in the corresponding sub-time period.
[0185] With the proportion λ not less than the second random number, Laplace noise corresponding to each sub-time period is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each sub-time period, and the Laplace noise corresponding to each sub-time period is added to the user's privacy dataset in the corresponding sub-time period.
[0186] In the aforementioned secondary privacy protection process, Laplace noise is considered to have a strong defensive effect against statistical attacks (such as differential attacks), protecting the basic distribution characteristics of data from being exploited by attackers; while Gaussian noise helps prevent model-based reverse inference attacks. Therefore, under high network load, increasing the proportion of Laplace noise can provide stronger privacy protection, preventing sensitive data from being intercepted by potential attackers due to network congestion; while under good network conditions, the proportion of Gaussian noise can be appropriately increased to reduce the degree of data obfuscation, thereby reducing communication latency between central and edge nodes and improving real-time processing capabilities. In other words, by dynamically adjusting the ratio of Gaussian and Laplace noise, this embodiment can adapt to various unforeseen changes in network scenarios, such as sudden data leakage risks and abnormal traffic, and enhance the system's resilience by adjusting the noise type and parameters in real time.
[0187] It should be noted that each module in the privacy data protection device for communication networks in this application corresponds one-to-one with each implementation step of the privacy data protection method for communication networks in Embodiment 1. Since Embodiment 1 has been described in detail, some details not shown in this embodiment can be referred to Embodiment 1, and will not be elaborated further here.
[0188] Example 3
[0189] According to an embodiment of this application, a computer program product is also provided, which includes a computer program, wherein when the computer program is executed by a processor, it implements the privacy data protection method for communication networks in Embodiment 1.
[0190] According to an embodiment of this application, a non-volatile storage medium is also provided, which includes a stored computer program, wherein the device containing the non-volatile storage medium executes the privacy data protection method for communication networks in Embodiment 1 by running the computer program.
[0191] According to an embodiment of this application, a processor is also provided for running a computer program, wherein the computer program executes the privacy data protection method for communication networks in Embodiment 1.
[0192] According to an embodiment of this application, an electronic device is also provided, comprising: a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the privacy data protection method for communication networks in Embodiment 1 through the computer program.
[0193] Specifically, the computer program executes the following steps during runtime: acquiring network operation status information of the communication network within a preset time period, and acquiring user privacy datasets corresponding to target edge nodes within the communication network; analyzing the network operation status information and user privacy datasets using a pre-trained privacy protection decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements. The privacy protection decision model is trained based on a dual-deep Q-learning algorithm and Stackelberg game theory. The target privacy budget allocation strategy includes at least: a first target privacy protection budget allocated to the target edge nodes and a second target privacy protection budget allocated to the center nodes; and performing privacy protection processing on the user privacy dataset according to the target privacy budget allocation strategy.
[0194] As an alternative implementation, the above-mentioned electronic device may exist in the form of a mobile terminal, a computer terminal, or a similar computing device. Figure 3 A hardware block diagram of an electronic device for implementing a privacy data protection method for communication networks is shown. Figure 3 As shown, the electronic device 30 may include one or more processors 302 (shown as 302a, 302b, ..., 302n in the figure) (processor 302 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 304 for storing data, and a transmission device 306 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of a BUS bus), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 3 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, electronic device 30 may also include... Figure 3 The more or fewer components shown, or having the same Figure 3 The different configurations shown.
[0195] It should be noted that the aforementioned one or more processors 302 and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be integrated, in whole or in part, into any other element of the electronic device 30. As involved in the embodiments of this application, the data processing circuits serve as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).
[0196] The memory 304 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the privacy data protection method for communication networks in this embodiment of the application. The processor 302 executes various functional applications and data processing by running the software programs and modules stored in the memory 304, thereby implementing the above-mentioned vulnerability detection method for the application. The memory 304 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 304 may further include memory remotely located relative to the processor 302, and these remote memories can be connected to the electronic device 30 via a network. Examples of the above-mentioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0197] The transmission device 306 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of the electronic device 30. In one example, the transmission device 306 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 306 may be a Radio Frequency (RF) module, used for wireless communication with the Internet.
[0198] The display may be, for example, a touchscreen liquid crystal display (LCD), which allows the user to interact with the user interface of the electronic device 30.
[0199] The sequence numbers of the above embodiments are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0200] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0201] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings, direct couplings, or communication connections may be through some interfaces; indirect couplings or communication connections between units or modules may be electrical or other forms.
[0202] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0203] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0204] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard drive, magnetic disk, or optical disk.
[0205] The above are merely preferred embodiments of this application. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for protecting privacy data in communication networks, characterized in that, include: Obtain network operation status information of the communication network within a preset time period, and obtain user privacy datasets corresponding to target edge nodes within the communication network; By analyzing the network operation status information and the user privacy dataset using a pre-trained privacy-preserving decision model, a target privacy budget allocation strategy is obtained that minimizes communication latency and meets preset privacy protection requirements. This target privacy budget allocation strategy includes at least: a first target privacy budget allocated to the target edge nodes and a second target privacy budget allocated to the center nodes. The privacy-preserving decision model is trained based on a dual-deep Q-learning algorithm and Stackelberg game theory. The specific training process includes: constructing an online Q-network and a target Q-network for solving the target privacy budget allocation strategy, and initializing the network weight parameters of the online Q-network and the target Q-network; setting empirical... The system establishes an experience pool and determines its capacity; it determines a first preset number of iteration cycles and iteratively solves the privacy budget allocation strategy and network weight parameters through the following steps: Within each iteration cycle, it initializes the communication network environment state and determines a second preset number of calculation cycles, wherein the communication network environment state includes at least: the user privacy dataset and network operation status information corresponding to the target edge node within the communication network; within each calculation cycle, it determines the second initial privacy protection budget for the central node based on the current communication network environment state, and determines the first initial privacy protection budget for the target edge node in conjunction with a preset total privacy budget; and determines the initial privacy protection budget based on the first initial privacy protection budget and the second initial privacy protection budget. An initial privacy budget allocation strategy is established. Based on this initial strategy, a multi-round game is conducted using Stackelberg game theory to obtain various privacy budget allocation strategies. The current communication network environment state is input into the online Q-network, and the predicted Q-value corresponding to each privacy budget allocation strategy is calculated. A target privacy budget allocation strategy with the largest predicted Q-value is selected based on a greedy strategy. The target privacy budget allocation strategy is the one that minimizes communication latency and meets preset privacy protection requirements. Corresponding privacy processing is performed according to the target privacy budget allocation strategy to obtain the corresponding reward and the new state of the communication network environment. The current communication network environment state and the target privacy budget allocation strategy are then compared and contrasted. The strategy, the reward, and the new state of the communication network environment are taken as a sample, and the sample is stored in the experience pool. Multiple samples randomly sampled from the experience pool are input into the neural network to calculate the probability of each sample being sampled, the mean squared error loss function, and the loss function weight. Based on the calculation results, the target Q value corresponding to the second target privacy protection budget is determined. Based on the predicted Q value and the target Q value corresponding to the target privacy budget allocation strategy, the network weight parameters of the online Q network are updated using the gradient descent method. After a third preset number of calculation cycles, the network weight parameters of the target Q network are updated according to the network weight parameters of the online Q network, wherein the third preset number is less than the second preset number.After the iteration is completed, the resulting target Q-network is used as the privacy protection decision model; The user privacy dataset is subjected to privacy protection processing in accordance with the target privacy budget allocation strategy.
2. The method according to claim 1, characterized in that, Obtain the user privacy dataset corresponding to the target edge node within the communication network, including: Identify multiple users within the coverage area of the target edge node; For each user, a privacy dataset for that user is obtained, wherein the privacy dataset includes at least: real-time location coordinates and device identifier; spatiotemporal labels are added to the real-time location coordinates using GeoHash encoding to obtain spatiotemporal location data; The user privacy dataset corresponding to the target edge node is composed of the privacy datasets processed by multiple users under the coverage area of the target edge node.
3. The method according to claim 1, characterized in that, Inputting the current communication network environment status into the online Q network, the predicted Q values corresponding to various privacy budget allocation strategies are calculated, including: For each privacy budget allocation strategy, the data utility corresponding to the privacy budget allocation strategy is calculated based on the user privacy dataset of the edge node, the network operation status information, and the privacy budget allocation strategy. The predicted Q value corresponding to each privacy budget allocation strategy is determined based on the long-term data utility value corresponding to each privacy budget allocation strategy.
4. The method according to claim 1, characterized in that, The user privacy dataset is subjected to privacy protection processing according to the target privacy budget allocation strategy, including: The preset time period is divided into multiple sub-time periods according to a preset time window, and the spatiotemporal sensitivity of the coverage area of the target edge node in each sub-time period is determined according to the following formula: ; In the formula, This indicates the spatiotemporal sensitivity of the coverage area of the target edge node in the sub-time period T. This represents the preset spatiotemporal sensitivity benchmark value. This represents the number of concurrent users within the coverage area of the target edge node during the sub-time period T. This represents the average number of concurrent users within the coverage area of the target edge node over a preset time period. These represent the weight parameters, and ; Based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, the user privacy dataset corresponding to the target edge node is subjected to initial privacy protection processing, wherein the initial privacy protection processing includes at least: noise addition processing and desensitization processing; Based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, a secondary privacy protection process is performed on the user privacy dataset corresponding to the target edge node, wherein the secondary privacy protection process includes at least: noise addition processing.
5. The method according to claim 4, characterized in that, Based on the first target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, initial privacy protection processing is performed on the user privacy dataset corresponding to the target edge node, including: For each sub-time period, determine the relationship between the spatiotemporal sensitivity within the sub-time period and the spatiotemporal sensitivity benchmark value; If the spatiotemporal sensitivity corresponding to the sub-time period is not less than the spatiotemporal sensitivity benchmark value, a first truncated Laplace noise is added to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and the device identifier of each user is desensitized to obtain a pseudo-device identifier. If the spatiotemporal sensitivity corresponding to the sub-time period is less than the spatiotemporal sensitivity benchmark value, a second truncated Laplace noise is added to the spatiotemporal location data of each user under the target edge node coverage area according to the first target privacy protection budget, and the device identifier of each user is desensitized to obtain a pseudo-device identifier. Wherein, the distribution width of the first truncated Laplace noise is smaller than the distribution width of the second truncated Laplace noise.
6. The method according to claim 5, characterized in that, The device identifiers of each user are anonymized to obtain pseudo-device identifiers, including: For each user within the coverage area of the target edge node, a preset first random number is determined, wherein the value of the first random number is between 0 and 1; The cumulative probability of pseudo-identifiers for different devices is calculated using the following formula: ; In the formula, r represents the user's device identifier, and o represents the device pseudo-identifier. This indicates the first target privacy protection budget. Z represents an example metric for the device pseudo-identifier and the user's device identifier, where Z represents a normalization constant. This represents a compliance check function used to perform compliance checks on the device pseudo-identifier. The device pseudo-identifier corresponding to the cumulative probability greater than the first random number is used as the device pseudo-identifier after de-identification processing of the user's device identifier.
7. The method according to claim 4, characterized in that, Based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, a secondary privacy protection process is performed on the user privacy dataset corresponding to the target edge node, including: For each user within the coverage area of the target edge node, the proportion of Gaussian noise added to the user's privacy dataset is calculated according to the following formula. : ; In the formula, Represents the weight parameters. Indicates control parameters, and The value of changes continuously as the privacy protection decision model converges, where The value of the privacy protection decision model in the early stage of convergence is smaller than the value in the later stage of convergence. Determine the percentage The relationship between the magnitude of the random number and a preset second random number, wherein the value of the second random number is between 0 and 1; The proportion If the value is not less than the second random number, Gaussian noise corresponding to each of the sub-time periods is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, and the Gaussian noise corresponding to each of the sub-time periods is added to the user's privacy dataset in the corresponding sub-time period. The proportion If the value is not less than the second random number, Laplace noise corresponding to each of the sub-time periods is generated based on the second target privacy protection budget and the spatiotemporal sensitivity corresponding to each of the sub-time periods, and the Laplace noise corresponding to each of the sub-time periods is added to the user's privacy dataset in the corresponding sub-time period.
8. A privacy data protection device for communication networks, characterized in that, include: The acquisition module is used to acquire network operation status information of the communication network within a preset time period, and to acquire user privacy datasets corresponding to target edge nodes within the communication network. The decision module is used to analyze the network operation status information and the user privacy dataset using a pre-trained privacy-preserving decision model to obtain a target privacy budget allocation strategy that minimizes communication latency and meets preset privacy protection requirements. The target privacy budget allocation strategy includes at least: a first target privacy budget allocated to the target edge nodes and a second target privacy budget allocated to the center nodes. The privacy-preserving decision model is trained based on a dual-deep Q-learning algorithm and Stackelberg game theory. The specific training process includes: constructing an online Q-network and a target Q-network for solving the target privacy budget allocation strategy, and initializing the network weight parameters of the online Q-network and the target Q-network. Set up an experience pool and determine its capacity; determine a first preset number of iteration cycles, and iteratively solve the privacy budget allocation strategy and network weight parameters through the following steps: In each iteration cycle, initialize the communication network environment state and determine a second preset number of calculation cycles, wherein the communication network environment state includes at least: the user privacy dataset and network operation status information corresponding to the target edge node in the communication network; In each calculation cycle, determine the second initial privacy protection budget of the central node based on the current communication network environment state, and determine the first initial privacy protection budget of the target edge node in combination with the preset total privacy budget; Based on the first initial privacy protection budget and the second initial privacy protection budget... An initial privacy budget allocation strategy is determined; based on the initial privacy budget allocation strategy, a multi-round game is conducted using Stackelberg game theory to obtain multiple privacy budget allocation strategies; the current communication network environment state is input into the online Q-network, and the predicted Q-value corresponding to each privacy budget allocation strategy is calculated. A target privacy budget allocation strategy with the largest predicted Q-value is selected based on a greedy strategy. The target privacy budget allocation strategy is the one that minimizes communication latency while meeting preset privacy protection requirements. Corresponding privacy processing is performed according to the target privacy budget allocation strategy to obtain the corresponding reward and a new state of the communication network environment. The current communication network environment state and the target privacy budget allocation strategy are then combined. The allocation strategy, the reward, and the new state of the communication network environment are taken as a sample, and the sample is stored in the experience pool. Multiple samples randomly sampled from the experience pool are input into the neural network to calculate the probability of each sample being sampled, the mean squared error loss function, and the loss function weight. Based on the calculation results, the target Q value corresponding to the second target privacy protection budget is determined. Based on the predicted Q value and the target Q value corresponding to the target privacy budget allocation strategy, the network weight parameters of the online Q network are updated using the gradient descent method. After a third preset number of calculation cycles, the network weight parameters of the target Q network are updated according to the network weight parameters of the online Q network, wherein the third preset number is less than the second preset number.After the iteration is completed, the resulting target Q-network is used as the privacy protection decision model; The privacy protection module is used to perform privacy protection processing on the user privacy dataset in accordance with the target privacy budget allocation strategy.
9. An electronic device, characterized in that, include: A memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute, via the computer program, the privacy data protection method for a communication network according to any one of claims 1 to 7.