A deep reinforcement learning caching method based on spatiotemporal privacy protection
By combining a decentralized D2D federated averaging algorithm and a differential privacy-enhanced feature obfuscation mechanism with a resource-aware deep reinforcement learning model, the challenges of privacy protection and performance optimization in mobile edge caching are solved, achieving efficient, secure, and energy-saving caching in dynamic network environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LIAONING UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-04-16
- Publication Date
- 2026-06-19
AI Technical Summary
In mobile edge caching, existing technologies struggle to provide effective privacy protection while ensuring the performance of intelligent caching, especially in dynamic network environments where users' privacy data is easily leaked, violating data privacy protection regulations and provoking user resistance.
A decentralized D2D federated averaging algorithm is used for model collaboration. It combines a differential privacy-enhanced spatiotemporal feature obfuscation mechanism with a local joint deep reinforcement learning model that is aware of mobility and resources. Model parameters are exchanged and weighted aggregated through D2D communication between devices, and privacy-preserving feature processing is performed locally. A reward function is designed to optimize caching decisions.
It achieves end-to-end protection of highly sensitive spatiotemporal data in dynamic environments, improves privacy and security, optimizes caching performance and energy efficiency, reduces the risk of privacy leakage, and systematically solves the trade-off between adaptability to dynamic environments, privacy and security and resource constraints.
Smart Images

Figure CN122241769A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of wireless communication technology, and specifically relates to a deep reinforcement learning caching method based on spatiotemporal privacy protection. Background Technology
[0002] With the large-scale deployment of 5G mobile communication technology and the widespread adoption of various data-intensive applications, mobile data traffic is experiencing explosive growth, posing unprecedented challenges to network capacity and service quality. Against this backdrop, mobile edge caching technology, by pre-storing popular content at the network edge closer to users, can significantly reduce content retrieval latency and alleviate backhaul link pressure, thus becoming a key enabling technology for improving user experience and optimizing network performance. However, in real-world 5G and Beyond 5G dynamic network environments, the high-speed mobility of users and the heterogeneity and time-varying nature of network topologies are intertwined, making traditional caching strategies based on static rules or fixed popularity assumptions difficult to apply. Therefore, in recent years, academia has focused on introducing artificial intelligence, especially deep reinforcement learning, into the field of edge caching. By designing a deep reinforcement learning framework that integrates user mobility prediction, it is possible to effectively utilize historical trajectory data to predict future locations and intelligently formulate caching decisions based on this, thereby achieving significant optimization of content transmission latency in dynamic environments.
[0003] However, a core issue that cannot be ignored is that the effectiveness of such high-performance intelligent caching strategies heavily relies on the collection and analysis of massive amounts of users' private data. Whether it's historical trajectories used to train location prediction models or real-time state information (such as precise geographic location and content request records) used to drive reinforcement learning agents to make decisions, highly sensitive personal privacy data is involved. To achieve global optimization, this data usually needs to be aggregated to a central entity for processing. This centralized data processing model exposes users' movement patterns, behavioral habits, and even personal preferences to potential leakage risks, violating increasingly stringent data privacy protection regulations (such as the EU's General Data Protection Regulation) and potentially causing user resistance to participating in collaborative caching services, ultimately hindering the practical implementation and application of this technology. Therefore, how to build reliable privacy protection while ensuring the performance advantages of intelligent caching is a crucial issue that must be addressed for current edge caching research to deepen. Summary of the Invention
[0004] The purpose of this invention is to provide a deep reinforcement learning caching method based on spatiotemporal privacy protection, which can improve the privacy security of mobile edge caching and provide end-to-end protection for highly sensitive spatiotemporal data.
[0005] The technical solution provided by this invention is as follows:
[0006] A deep reinforcement learning caching method based on spatiotemporal privacy protection includes:
[0007] A deep reinforcement learning network model is constructed and trained to obtain a cache decision model;
[0008] The deep reinforcement learning network model corresponds one-to-one with the user equipment;
[0009] The user equipment's recent movement trajectory sequence across multiple time slots is acquired, and the movement trajectory sequence is encoded to obtain a recent movement pattern feature vector of the user equipment.
[0010] Noise is added to the user's recent mobility pattern feature vector using a differential privacy method to obtain a user equipment recent mobility pattern feature vector with enhanced differential privacy.
[0011] Decoding the recent mobility pattern feature vector of the user equipment yields a pseudo-future location sequence of the user equipment;
[0012] The user equipment's recent mobility pattern feature vector, user equipment's pseudo-future location sequence, user equipment's local cache state vector, current channel state, and summary information of the user equipment's logical neighbor cached content are integrated into a state space and input into the cache decision model; the cache decision model outputs a cache decision.
[0013] Preferably, the deep reinforcement learning network model is trained using a decentralized federated model, including the following steps:
[0014] User equipment The deep reinforcement learning network model interacts with the environment to train the model locally and obtain self-updating model parameters;
[0015] User equipment Broadcast the self-updating model parameters to all neighboring devices in the current time slot, and receive the set of self-updating model parameters sent by all neighboring devices in the current time slot;
[0016] User equipment Perform model aggregation to obtain updated deep reinforcement learning network model parameters. :
[0017] ;
[0018] in, Indicates user equipment and neighboring equipment Aggregate weights, Indicate neighboring devices Self-updating model parameters, Indicates the current time slot user equipment The set of neighboring devices.
[0019] Preferably, user equipment and neighboring equipment The formula for calculating the aggregate weight is:
[0020] ;
[0021] in, For neighboring devices The number of effective experiences collected during interactions with the environment in the most recent collaboration cycle. Represents any device in the aggregation The number of effective experience transformations collected from interactions with the environment during the most recent collaboration cycle.
[0022] Preferably, the recent mobility pattern feature vector of the user device with differential privacy enhancement is:
[0023] ;
[0024] in, To enhance the recent mobility pattern feature vector of user devices for differential privacy, This is the feature vector of the user equipment's recent mobility pattern. Represents the differential privacy noise vector. The standard deviation of Gaussian noise is represented by... Represents the identity matrix. Together they form the covariance matrix.
[0025] Preferably, the standard deviation The calculation formula is:
[0026] ;
[0027] in, and Indicates differential privacy budget, The feature vectors output by the encoder on any two adjacent trajectory datasets (difference only by one time slot location point) Sensitivity.
[0028] Preferably, the cache decision model outputs cache decisions including: the user equipment's cache action, the CPU frequency selected during the user equipment's local model training, and the D2D transmit power selected during the user equipment's federated model parameter exchange.
[0029] Preferably, the reward function used for local model training of the deep reinforcement learning model on the user device is:
[0030] ;
[0031] in, Indicates user equipment In the time slot cache hit rate Indicates user equipment In the time slot The average latency experienced when fetching content For user equipment The energy consumption generated by local model training User equipment Energy consumption during parameter exchange in D2D communication All are weighting coefficients.
[0032] The beneficial effects of this invention are:
[0033] The deep reinforcement learning caching method based on spatiotemporal privacy protection provided by this invention can improve the privacy security of mobile edge caching and provide end-to-end protection for highly sensitive spatiotemporal data; it systematically solves the trade-off problem between dynamic environment adaptability, privacy security and resource constraints in mobile edge caching. Attached Figure Description
[0034] Figure 1 This is a schematic diagram of the distributed intelligent collaborative communication network model described in this invention.
[0035] Figure 2 This is a comparison chart of content transmission latency under different caching strategies in the experimental examples of this invention.
[0036] Figure 3 This is a comparison chart of system energy consumption changes over time under different caching strategies in the experimental examples of this invention.
[0037] Figure 4 This is a comparison chart of the success rates of different caching strategies under member inference attacks in the experimental examples of this invention. Detailed Implementation
[0038] The present invention will now be described in further detail with reference to the accompanying drawings, so that those skilled in the art can implement it based on the description.
[0039] This invention provides a deep reinforcement learning-based caching method with spatiotemporal privacy protection, aiming to overcome the dilemma of simultaneously achieving "performance optimization," "privacy protection," and "energy efficiency" in mobile edge caching. Figure 1As shown, this invention is a closed-loop intelligent caching method running on every user device. Its core consists of three tightly coupled innovative modules: 1) a decentralized periodic D2D federated averaging algorithm, responsible for achieving secure model collaboration in the dynamic overlay network G(t); 2) a differential privacy-enhanced spatiotemporal feature obfuscation mechanism, which encodes and adds noise to the original mobile trajectory data locally, cutting off the path of spatiotemporal privacy leakage at the source; and 3) a local joint deep reinforcement learning model for mobility and resource awareness, utilizing privacy-preserved features for collaborative decision-making on caching and resource allocation. The following sections will provide a detailed explanation of these three modules and the overall algorithm flow.
[0040] I. Decentralized Periodic D2D Federated Average Algorithm
[0041] Traditional federated learning frameworks with a central server as an aggregator suffer from single points of failure and communication bottlenecks, making them unsuitable for fully distributed network models. This invention designs a fully decentralized federated averaging algorithm based on device-to-device (D2D) communication. The core idea of this algorithm is that devices only communicate with each other in a dynamic logical overlay network. The direct neighbors exchange model parameters and spread global knowledge slowly throughout the network through multiple local iterations.
[0042] During the local training phase (model parameters self-updating), time is divided into macro-level collaborative cycles, each cycle containing... Each time slot. ,equipment It independently runs its local DRL agent (see the "Local Joint Deep Reinforcement Learning Model for Mobility and Resource Awareness" section below), interacts with the environment, and stores the resulting empirical data in a local playback buffer for periodically updating its local model parameters. No model exchange between devices occurs during this stage.
[0043] At the end of each collaboration cycle, all devices enter the model exchange phase in parallel. Set its current local DRL model parameters Broadcast to its logical neighbor set at the current moment via the D2D link. All devices (i.e., all logical neighbor devices) within the system. While sending its own parameters, the device... It also receives data from all neighboring devices. Sending model parameters Subsequently, the equipment Perform a model aggregation operation locally to generate a new generation of local models, with updated model parameters. :
[0044] ;
[0045] Among them, aggregate weight The design is crucial; to reflect the differences in data quality and quantity across different devices, a weighting allocation based on effective empirical data is adopted:
[0046] ;
[0047] in, It is equipment The number of valid experiences (data samples that play a significant role in the interaction between the agent and the environment in deep learning) collected in the most recent period indirectly reflects the activity level of the device in learning and the freshness of its data. This weighting method gives greater weight to devices with more and newer experience in local model aggregation.
[0048] The aforementioned process of "local training - periodic exchange - local aggregation" is repeated continuously. Through multiple iterations, each device's local model not only contains its own private knowledge but also gradually absorbs privacy-preserving data from multi-hop neighbors. This model naturally adapts to dynamic changes in network topology and requires no central coordinator, making it extremely robust.
[0049] II. Spatiotemporal Feature Obfuscation Mechanism for Differential Privacy Enhancement
[0050] User movement trajectory This is extremely sensitive spatiotemporal privacy data. If its features, or the encoded features, are directly used as state input for DRL, they can be deconstructed through inference attacks, even when model parameters are exchanged. To provide provable privacy guarantees, this invention designs a differential privacy-enhanced spatiotemporal feature obfuscation mechanism, which includes three steps: encoding, adding noise, and decoding, all completed locally on the device.
[0051] The first step is to perform spatiotemporal sequence encoding on the device. Maintain a recurrent neural network (RNN) encoder locally. In each time slot, it will be the closest to itself. The original coordinate sequence of each time slot The input encoder yields a low-dimensional feature vector representing its recent movement patterns:
[0052] ;
[0053] This feature vector It captures the user's movement patterns, but they are still strongly correlated with the original trajectory and cannot be directly shared.
[0054] To provide strict data privacy guarantees, a Gaussian noise mechanism is applied to the encoded feature vectors. First, the feature vectors output by the encoder on any two adjacent trajectory datasets (differenced by only one time slot position) are calculated. Sensitivity Then, based on the preset privacy budget... and Generate what satisfies - Noise vector for differential privacy The noise-added feature vector is obtained as follows:
[0055] ;
[0056] in, Noisy feature vectors With strict privacy protection, the true trajectory can be deduced from this feature. It becomes extremely difficult in theory.
[0057] Noisy feature vectors This will be part of the DRL status. Additionally, the device... A recurrent neural network (RNN) decoder is also maintained locally to accompany the encoder. The decoder uses Given the input, attempt to reconstruct a pseudo-future position sequence. It is used for local DRL decision-making. Since the decoder is only used locally and the input is noisy features, its output does not introduce additional privacy risks, but can provide valuable, privacy-preserving mobility prediction information for cached decisions.
[0058] III. Local Joint Deep Reinforcement Learning Model for Mobility and Resource Awareness
[0059] Each user device Each is an independent agent running an improved dual deep Q-network responsible for making joint decisions, and its design fully integrates mobility prediction and federated learning overhead.
[0060] state space The intelligent agent in the time slot The state is a tuple:
[0061] ;
[0062] in, It is equipment A binary vector of the local cache state; This is a noisy feature vector, i.e., differential privacy-preserving mobile feature; It is a short-term future pseudo-location sequence predicted by the local decoder; It is the current channel state information, including the channel quality with the base station and major neighbors; It is a summary of the contents of the logical neighbor cache, which can be obtained through lightweight neighbor signaling and does not contain private data.
[0063] Action space The agent makes a composite decision every few time slots:
[0064] ;
[0065] in, This indicates a caching action, determining which content block in the local cache will be replaced, and which new content will be used to replace it; The CPU frequency selected for the local model in the next training cycle. , For a set of CPU frequencies, choosing a higher CPU frequency can speed up training but increase energy consumption. The D2D transmit power selected for parameter swapping in the federated model for the next cycle. , For D2D transmit power sets, higher transmit power can improve transmission reliability and expand the effective neighbor range, but it will also increase energy consumption.
[0066] reward function The design aims to drive intelligent agents to balance cache performance, privacy protection overhead, and energy consumption. (In time slots) Calculate at the end:
[0067] ;
[0068] in, Indicates user equipment In the time slot Cache hit rate; Indicates device In the time slot The average latency experienced when fetching content; It is a weighting coefficient used to adjust the relative importance of various indicators. For user equipment The energy consumption generated by local model training User equipment The energy consumption generated during parameter exchange in D2D communication is calculated using the following formulas:
[0069] ;
[0070] in, This is the number of CPU cycles required for the local model to process one data sample during training. The effective capacitance coefficient of the user equipment CPU. For user equipment The size of the local dataset.
[0071] ;
[0072] in, For D2D link bandwidth, The path loss constant is... This is the path loss index. For user equipment noise power, Interference between D2D links, and They represent user equipment respectively and neighboring equipment The physical coordinates of the location User equipment With neighboring devices Exchange model parameters.
[0073] The reward function described above encourages the agent to learn low-latency, high-energy-consumption caching strategies.
[0074] The deep reinforcement learning caching method based on spatiotemporal privacy protection provided by this invention integrates all the above modules and is available from user devices. The specific process executed by the perspective is as follows.
[0075] Step 1: Initialize the parameters of the local DRL (Deep Reinforcement Learning Model) main network. Target network parameters Initial spatiotemporal encoder parameters Decoder parameters Privacy Budget Length of collaboration cycle .
[0076] Step 2: Within each collaboration cycle, for arrive In each time slot, the user equipment Repeat the following sub-steps:
[0077] Step 2.1: In the environment, according to the formula Obtaining the state space ;
[0078] Step 2.2: According to - Greedy strategy, selecting actions from the action space. ;
[0079] Step 2.3: Perform caching actions Update local cache; according to , Configure resources.
[0080] Step 2.4: Record the cache hit rate when processing user requests within a time slot. Average delay and corresponding energy consumption and ;
[0081] Step 2.5: At the end of the time slot, according to the formula Calculate rewards Meanwhile, observe the new state. ;
[0082] Step 2.6: Prepare the sample Stored in the experience buffer;
[0083] Step 2.7: Sample small batches of data from the experience buffer to update the local master network parameters. ;
[0084] Step 2.8: Periodically perform soft updates to the target network parameters ;
[0085] Step 3.1: Broadcast the current model parameters via the D2D link. Give all logical neighbors ;
[0086] Step 3.2: Receive from all neighbors Model parameters { }
[0087] Step 3.3: According to the formula Calculate weights And according to the formula Aggregate model parameters locally to obtain updated model parameters. ;
[0088] Step 3.4: Update the local model with the aggregated parameters: ;
[0089] Step 4: Update the spatiotemporal encoder using the latest local data. and decoder parameters .
[0090] Step 5: Return to step 2 for the next round of iterations until the algorithm converges or reaches the training round.
[0091] This invention fundamentally reconstructs the system model, establishing a two-layer network architecture. The bottom layer is the physical network responsible for content transmission, while the upper layer is a dynamic logical overlay network specifically built for distributed intelligent collaboration. Based on this model, this invention designs a decentralized periodic D2D federated averaging algorithm. This algorithm enables devices to exchange model parameters only with neighbors within their dynamic communication range via D2D links and perform weighted aggregation locally. To protect the most sensitive user movement trajectory information, a lightweight spatiotemporal encoder-decoder network is introduced locally on the device. Gaussian noise that meets strict mathematical definitions is injected into the encoded movement feature vector to ensure that it satisfies... - Differential privacy. All subsequent collaborative learning and decision-making are based on this noisy pseudo-feature, thus cutting off the path of spatiotemporal privacy leakage at the source and providing provable privacy protection. Finally, a local joint deep reinforcement learning model for mobility and resource awareness is constructed. Each device acts as an agent, and its state space integrates privacy-preserved mobility features with future location predictions, cache status, and network environment information. Its action space is a joint decision-making process, which includes not only replacing cached content but also adaptive allocation of local computing and communication resources. By integrating reward functions based on cache hits, content latency, and energy consumption, the agent can learn to optimize cache performance and the system overhead of participating in federated learning under privacy constraints.
[0092] Test case
[0093] In a dynamic edge network environment, simulations are performed from multiple dimensions such as caching performance, resource efficiency, privacy protection strength, and algorithm robustness. The caching method of this invention is compared and analyzed with various benchmark algorithms to verify its effectiveness and superiority.
[0094] 1. Experiment setup and parameter configuration
[0095] A simulation platform was built using Python 3.9 and the PyTorch deep learning framework. The network topology covers a 500m × 500m square area, within which one macro base station and three small base stations are randomly deployed. The number of mobile user devices... The value is adjustable between 20 and 100, with a default setting of 50. User equipment is randomly distributed within the area, and its movement is simulated using a random walk model, with a uniform speed distribution between 0-3 m / s. The D2D communication radius is set to 50 m and will vary during sensitivity analysis. Communication parameters are set according to the 5G NR standard, and specific values are shown in Table 1.
[0096] Table 1 Simulation parameter configuration
[0097]
[0098] Experimental simulations were conducted using the real-world MovieLens dataset, treating movies as content and user ratings (≥3 points) as implicit requests. By modeling the rating records of 6040 users on 3952 movies in the dataset, the long-tail distribution characteristics of global content popularity and the similarity of user preferences were extracted. The request generation model was extended to incorporate spatiotemporal characteristics brought about by mobility. The probability of a device making a request at a specific location is correlated with its movement trajectory and the scene at that location. Several scenes are predefined in the network area, each with its preferred content category. When a user moves to a scene, the probability of their requested content belonging to that scene's preferred category increases significantly, thus adding local spatiotemporal relevance to the global popularity.
[0099] To conduct a comprehensive evaluation, the following representative benchmark algorithms were selected for comparison:
[0100] (1) Least Recently Used (LRU): Replace the content that has been requested the least in a certain period of time with the most popular content requested by the current user.
[0101] (2) Centralized DQN caching (Cen-DQN): This algorithm is a central controller located at the base station that collects complete state information of all users and trains a global DQN to formulate caching strategies for all users. This algorithm serves as the upper bound for performance when privacy is not considered.
[0102] (3) NoCollab-DQN: Each device trains a DQN independently using only local data and does not exchange models or data with other devices in any way.
[0103] (4) Centralized Federated Average Cache (FedAvg-Cen): The classic federated average framework is adopted. The device trains the model locally and uploads the model parameters to the base station for aggregation. The aggregated global model is then distributed to each device.
[0104] (5) Decentralized Federated Average (DFL-Avg): A decentralized federated learning baseline in which the device only exchanges model parameters with its neighbors, i.e., aggregates weights. Furthermore, it lacks differential privacy protection and resource optimization features.
[0105] 2. Performance Evaluation and Comparative Analysis
[0106] In simulations running over 100,000 time slots, the steady-state results of the last 20,000 slots were selected for statistical analysis. Evaluation metrics included: average transmission delay, total system power consumption, privacy protection strength, and algorithm convergence robustness.
[0107] Figure 2The transmission latency under different caching strategies was compared, demonstrating the variation of transmission latency with cache capacity for different algorithms. All learning-based algorithms significantly outperformed the traditional rule-based algorithm LRU, validating the effectiveness of proactive cache prediction. With small cache capacities, the hit rate and latency performance of our caching method (Our-Scheme) approached the upper performance bound of Cen-DQL and significantly outperformed FedAvg-Cen and DFL-Avg. As cache capacity increased, the performance of all algorithms improved, but the gap between our caching method and Cen-DQL narrowed further. At that time, the transmission latency difference was less than 5%. This proves that although the transmission latency of the caching method of this invention is limited by differential privacy noise and distributed local aggregation, it can still approach the performance of the centralized ideal algorithm through efficient weighted aggregation and local feature learning; NoCollab-DQN performed the worst, highlighting the indispensability of inter-device collaborative learning for improving overall caching performance under non-independent and identically distributed data.
[0108] Figure 3 The system's energy consumption over simulation time is demonstrated. The total energy consumption of the caching method in this invention is lower than that of Cen-DQL and FedAvg-Cen. The high energy consumption of Cen-DQL stems from the need for users to continuously upload high-dimensional state information to the center, while the energy consumption of FedAvg-Cen mainly comes from the periodic uplink transmission of model parameters between the device and the central server. The energy efficiency advantage of the caching method in this invention comes from two aspects: first, fully decentralized D2D communication is generally more energy-efficient than device-to-base station uplink communication; second, the DRL agent learns to adaptively select the level of computational and communication resources through joint optimization. This approach ensures effective learning while avoiding unnecessary resource waste. Although DFL-Avg is also a D2D communication method, its lack of resource optimization mechanisms results in higher energy consumption than the caching method described in this invention.
[0109] To quantitatively evaluate the effectiveness of privacy protection, a member inference attack was implemented. The attacker (treated as a neighboring device) attempted to determine whether a specific user trajectory request record was used to train the target device's local model. A shadow model training method was used to construct the attack classifier, and the attack success rate was used as a measure of privacy leakage risk. Figure 4The success rates of different strategies under member inference attacks were compared. Cen-DQL and DQL-NoCollab (without privacy protection) had the highest success rates, reaching 87%, indicating complete privacy exposure. FedAvg-Cen and basic DFL-Avg reduced the attack success rate to approximately 55%-60% by not sharing the original data, but risks still exist because the model parameters themselves memorize training data features. The caching method of this invention, after applying a differential privacy mechanism, suppresses the attack success rate to close to 50% (random guessing level), approximately 52%. Even with further increases in privacy budget... The success rate of attacks also decreases very slowly.
[0110] In addition, to analyze the contribution of each module in the caching method of the present invention, ablation experiments were conducted, and the following variants were compared. The results are shown in Table 2.
[0111] (1) Ours (w / o DP): Remove differential privacy noise module.
[0112] (2) Ours (w / o Weight): Change the weighted aggregation to simple average aggregation.
[0113] (3) Ours (w / o Resource): The DQN action space contains only cached actions, removing computation and communication resource optimizations.
[0114] Table 2 Comparison of ablation test performance (normalized values)
[0115]
[0116] As shown in Table 2, while removing differential privacy (w / o DP) slightly improves cache performance and reduces energy consumption, it completely disables privacy protection, leading to a surge in attack success rate. This highlights the necessity of privacy protection and the reasonableness of the costs incurred in this invention. Removing weighted aggregation (w / o Weight) results in a significant performance decrease, indicating that a weighting strategy based on the effective amount of data can more effectively integrate high-quality data, which is key to improving distributed learning efficiency. Removing resource optimization (w / o Resource) leads to a significant increase in system energy consumption, while cache performance is almost unaffected. This confirms that the joint optimization mechanism can achieve significant energy-saving benefits at a small performance cost, which is crucial for energy-constrained mobile networks.
[0117] This invention addresses the data leakage risks of centralized learning by constructing a fully decentralized federated architecture based on D2D communication. Devices exchange model parameters only with their physical neighbors, achieving knowledge collaboration through periodic weighted aggregation, eliminating dependence on a central server, and improving system robustness and dynamic adaptability.
[0118] To address the sensitivity issue of trajectory data, a differential privacy-enhanced spatiotemporal feature obfuscation module is designed. The original trajectory is encoded locally on the device and Gaussian noise is injected to generate pseudo-features that meet strict privacy requirements, preventing the leakage of the association between location and request records. Simultaneously, a local deep reinforcement learning agent with mobility and resource awareness is designed. Its state space integrates the noise-perturbed spatiotemporal features, future location predictions, and network environment information; its action space jointly optimizes cache replacement, training computational resources, and communication power. A reward function drives the agent to balance latency, energy consumption, and cache hit rate. Simulation results show that, under strict privacy constraints, the transmission latency of this invention approaches the upper bound of centralized performance without privacy protection, and the system energy consumption is significantly lower than the benchmark algorithm. The success rate of member inference attacks is reduced to near the level of random guessing, confirming the effectiveness of privacy protection. Ablation experiments further verify that the weighted aggregation mechanism improves the cache hit rate, and the resource optimization module reduces system energy consumption.
[0119] In summary, this invention systematically solves the challenge of balancing dynamic environmental adaptability, privacy security, and resource constraints in mobile edge caching. It overcomes the privacy leakage bottleneck by constructing a mechanism of "decentralized federated learning + differential privacy obfuscation + joint resource optimization," providing end-to-end protection for highly sensitive spatiotemporal data while maintaining caching performance and energy efficiency. Furthermore, it provides a feasible technical path for distributed intelligent caching in privacy-sensitive scenarios, laying a theoretical foundation for the efficient and secure deployment of future B5G / 6G networks.
[0120] Although embodiments of the present invention have been disclosed above, they are not limited to the applications listed in the specification and embodiments. They can be applied to various fields suitable for the present invention. For those skilled in the art, other modifications can be easily made. Therefore, without departing from the general concept defined by the claims and their equivalents, the present invention is not limited to the specific details and illustrations shown and described herein.
Claims
1. A deep reinforcement learning caching method based on spatiotemporal privacy protection, characterized in that, include: A deep reinforcement learning network model is constructed and trained to obtain a cache decision model; The deep reinforcement learning network model corresponds one-to-one with the user equipment; The user equipment's recent movement trajectory sequence across multiple time slots is acquired, and the movement trajectory sequence is encoded to obtain a recent movement pattern feature vector of the user equipment. Noise is added to the user's recent mobility pattern feature vector using a differential privacy method to obtain a user equipment recent mobility pattern feature vector with enhanced differential privacy. Decoding the recent mobility pattern feature vector of the user equipment yields a pseudo-future location sequence of the user equipment; The user equipment's recent mobility pattern feature vector, user equipment's pseudo-future location sequence, user equipment's local cache state vector, current channel state, and summary information of the user equipment's logical neighbor cached content are integrated into a state space and input into the cache decision model; the cache decision model outputs a cache decision.
2. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 1, characterized in that, The deep reinforcement learning network model is trained using a decentralized federated model, including the following steps: User equipment The deep reinforcement learning network model interacts with the environment to train the model locally and obtain self-updating model parameters; User equipment Broadcast the self-updating model parameters to all neighboring devices in the current time slot, and receive the set of self-updating model parameters sent by all neighboring devices in the current time slot; User equipment Perform model aggregation to obtain updated deep reinforcement learning network model parameters. : ; in, Indicates user equipment and neighboring equipment Aggregate weights, Indicate neighboring devices Self-updating model parameters, Indicates the current time slot user equipment The set of neighboring devices.
3. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 2, characterized in that, User equipment and neighboring equipment The formula for calculating the aggregate weight is: ; in, For neighboring devices The number of effective experiences collected during interactions with the environment in the most recent collaboration cycle. Any user device in the aggregation The number of effective experience transformations collected from interactions with the environment during the most recent collaboration cycle.
4. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 2 or 3, characterized in that, The recent mobility pattern feature vector of user devices with differential privacy enhancement is: ; in, To enhance the recent mobility pattern feature vector of user devices for differential privacy, This is the feature vector of the user equipment's recent mobility pattern. Represents the differential privacy noise vector. The standard deviation of Gaussian noise is represented by... Represents the identity matrix.
5. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 4, characterized in that, The standard deviation The calculation formula is: ; in, and Indicates differential privacy budget, This represents the sensitivity of the feature vectors output by the encoder on any two adjacent trajectory datasets.
6. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 5, characterized in that, The cache decision model outputs cache decisions including: the user equipment's cache actions, the CPU frequency selected during the user equipment's local model training, and the D2D transmit power selected during the user equipment's federated model parameter exchange.
7. The deep reinforcement learning caching method based on spatiotemporal privacy protection according to claim 6, characterized in that, The reward function used for local training of the deep reinforcement learning model on the user device is: ; in, Indicates user equipment In the time slot Cache hit rate Indicates user equipment In the time slot The average latency experienced when retrieving content For user equipment The energy consumption generated by local model training User equipment Energy consumption during parameter exchange D2D communication All are weighting coefficients.