Dynamic edge computing server placement method based on deep reinforcement learning
By using grayscale heatmaps and proxy models based on deep reinforcement learning, the problem of dynamic adjustment of edge server placement was solved, achieving adaptive and efficient edge server layout optimization, and improving computing resource utilization and service quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FUDAN UNIVERSITY
- Filing Date
- 2022-06-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies lack the ability to dynamically adjust the placement of edge servers, making them unable to adapt to changes in the environment and network. This results in reduced service performance and wasted resources, and traditional optimization algorithms cannot solve complex problems within an acceptable timeframe.
A deep reinforcement learning-based approach is adopted to autonomously determine the dynamic placement of edge servers by constructing grayscale heatmaps and agent models. Fixed costs, dynamic costs, and communication time costs are considered, and the D3QN and PPO algorithms are used to optimize model training.
It enables adaptive and intelligent placement of edge servers, improving model learning speed and computational efficiency, making it suitable for dynamic network environments, and reducing resource waste and latency.
Smart Images

Figure CN117311952B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of edge computing technology, and in particular to a method for placing dynamic edge computing servers based on deep reinforcement learning. Background Technology
[0002] Mobile Edge Computing (MEC) places data center-level computing and storage resources closer to users and terminals at the network edge, alleviating the high latency and core network congestion issues associated with providing remote computing to users from cloud computing centers. MEC's main functionality relies on edge servers, which are designed to provide edge computing services for low-latency, high-bandwidth, and complex mobile applications (such as augmented reality and video analytics). Therefore, deploying an MEC environment first requires considering the placement of edge servers. A well-designed edge server layout can effectively improve the response latency of computing tasks, reduce system energy consumption, enhance communication and computing resource utilization, and improve Quality of Service (QoS). Conversely, a poor edge server layout can severely impact QoS and easily lead to network congestion and resource waste. Therefore, the placement and layout of edge servers need to be strategically optimized based on network conditions, user distribution, and task types.
[0003] On the one hand, current research on edge server placement is almost entirely conducted in a static environment, considering only the network conditions at a single moment or time period. Location selection is performed once fixed with given parameters, and the layout of the edge servers remains unchanged after selection. However, in real-world scenarios, the environment, demands, service costs, and efficiency change over time, and existing network infrastructure may also change due to urban development, leading to reduced edge service performance or rendering the original edge server layout unsuitable. Therefore, the location and layout of edge servers should be a long-term, continuous decision-making process, making dynamic deployment essential. On the other hand, there is currently no research using Deep Reinforcement Learning (DRL) for dynamic edge server placement. Since edge server placement is an NP-hard (Non-deterministic Polynomial-time) and non-convex problem, existing research employs traditional optimization algorithms to solve the problem within an acceptable timeframe, such as integer linear programming, clustering algorithms, multi-stage heuristics, Lagrange relaxation, and biomimetic intelligent algorithms (e.g., genetic algorithms, particle swarm optimization, and simulated annealing). However, these algorithms typically require complex iterative optimization for each solution and cannot adapt to changes in the network and environment. Furthermore, when the network size and environment become more complex, these algorithms will be unable to solve the problem within an acceptable timeframe. Summary of the Invention
[0004] This invention addresses the aforementioned problems and aims to provide a method for dynamic edge computing server placement based on deep reinforcement learning. The invention employs the following technical solution:
[0005] This invention provides a method for dynamic edge computing server placement based on deep reinforcement learning, characterized by the following steps: Step S1, acquiring base station information in a real-world scenario, including the geographic location of the base station and its load status; Step S2, constructing the environmental state O of the base station based on the geographic location and load status. t Step S3: Based on the heat map of the base station, and using the load status of the base station as a metric, the environmental state O is determined. t The process is as follows: Step S4: The grayscale heatmap is converted into a grayscale heatmap; Step S5: The grayscale heatmap is input into a neural network for training a deep reinforcement learning algorithm to obtain a trained proxy model, which is an artificial intelligence model used to dynamically place edge servers; Step S6: The grayscale heatmap is used as a state input to the proxy model, and the proxy model is based on the environmental state O. tAutonomously determine and execute edge server placement actions a t This allows for the placement of edge servers and changes to the environmental state O. t For the next environmental state O t+1 At the same time, receive timely rewards. t Step S6, based on the next environmental state O t+1 and the aforementioned timely reward r t Update the agent model and use the current agent model to guide the next placement action; Step S7, continuously iterate and repeatedly update and optimize the agent model until the agent model converges to reach the predetermined optimization goal, that is, maximize the accumulated reward.
[0006] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention may also have the following technical features, wherein step S2 includes the following sub-steps: step S2-1, locating the base station position using the latitude and longitude coordinates of the base station to reflect the spatial distribution of the base station; step S2-2, inverting the load of the base station and mapping it to a grayscale range of [0, 255], where 255 represents white and 0 represents black, that is, the greater the load of the base station, the darker the color of the heatmap.
[0007] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention also has the following technical features: the training of the deep reinforcement learning agent in step S4 adopts agent training based on D3QN, and the training steps are as follows: Step A1, initialize the master Q network and the target Q network; Step A2, input the state into the network, and select an edge server placement action a according to the ε greedy policy. t Step A3, perform the action a. t Then state O t Transition to the next state O t+1 And receive an instant reward r t Step A4: Set the next state to the current state O. t =O t+1 and will learn from experience (O t a t O t+1 r t Step A5: After the experience pool is full, sample learning experience from the experience pool according to batch size and input it into the main Q network and the target Q network; Step A6: Calculate the loss function and perform gradient descent to update the parameters of the main Q network; Step A7: After every K steps, copy the parameters of the main Q network to the target Q network to update the parameters of the target Q network.
[0008] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention also has the following technical features: the training of the deep reinforcement learning agent in step S4 further adopts the high-performance PPO-based agent training, and the training steps are as follows: Step B1, initialize policy parameters; Step B2, use the policy to interact with the environment to collect the state, the action and the reward, and calculate the advantage function value; Step B3, continuously update the policy parameters and find the optimal policy parameters corresponding to the objective function.
[0009] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention also has the following technical features: the dynamic placement of the edge server in step S4 takes into account relevant placement indicators, including fixed cost, dynamic cost, and communication time cost; the fixed cost refers to the cost incurred when the edge server is placed in different locations during dynamic placement. The dynamic cost refers to the cost incurred due to changes in the edge server's dynamic placement. The calculation expression is as follows:
[0010]
[0011]
[0012]
[0013] In the formula, This indicates the cost of adding new edge servers. This indicates the cost savings after removing the edge server. This represents the cost of migrating edge servers, where t and t-1 represent the placement period, and S... (t) and S (t-1) This represents the set of edge server layout results within the placement period, the communication time cost, and the impact of the edge server placement location on the transmission latency for mobile users. And the waiting time for the base station or user being served. Wherein, the transmission delay is equal to the sum of the signal transmission delay and the propagation delay, and the waiting time... This represents the time difference between two consecutive service receptions.
[0014] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention may also have the following technical features, wherein step S5 includes the following sub-steps: Step S5-1, based on the existing computing needs in the network, determine the number of edge servers for period t and complete the placement of the servers; Step S5-2, select a fixed number of edge servers to undergo location migration, changing the original layout of the edge servers; Step S5-3, the edge servers complete the service to the base station and change the current load status of the base station; Step S5-4, based on the load status and location changes of the base station, thereby changing the environmental state of the base station.
[0015] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention may also have the following technical features, wherein the timely reward r mentioned in step S5... t The hybrid reward, comprised of the fixed cost, the dynamic cost, and the time cost, is expressed as follows:
[0016]
[0017] In the formula, F() represents a function that maps time to cost.
[0018] The dynamic edge computing server placement method based on deep reinforcement learning provided by this invention also has the following technical features, including: step S8, using the trained proxy model to compare its performance with the K-mediods, Top-K, and Random algorithms widely used in edge server placement research.
[0019] Invention Function and Effect
[0020] The dynamic edge computing server placement method based on deep reinforcement learning according to the present invention, by considering the geographical and temporal factors of business in real edge computing environments, achieves adaptive and intelligent placement of edge servers based on deep reinforcement learning, filling the gap in research on the intelligent implementation of dynamic edge server placement and effectively guiding the dynamic placement of edge servers in real-world scenarios. It provides an effective solution for transforming sparse reinforcement learning problems into computer vision problems. Furthermore, the present invention converts the environmental state into a grayscale heatmap input instead of directly inputting a color image, greatly reducing the computational load of the neural network, improving the model's learning speed, and facilitating neural network training. Attached Figure Description
[0021] Figure 1 This is a flowchart illustrating the dynamic edge computing server placement method based on deep reinforcement learning in an embodiment of the present invention.
[0022] Figure 2 This is a simplified flowchart of the dynamic edge computing server placement method based on deep reinforcement learning according to an embodiment of the present invention;
[0023] Figure 3 This is a base station heatmap in an embodiment of the present invention;
[0024] Figure 4 This is a schematic diagram of the converted grayscale heatmap in an embodiment of the present invention;
[0025] Figure 5 This is a schematic diagram of the waiting time of the base station in period t in an embodiment of the present invention;
[0026] Figure 6 This is a schematic diagram of the edge server layout during period t in an embodiment of the present invention;
[0027] Figure 7 This is a schematic diagram of the dynamic placement of edge servers in an embodiment of the present invention;
[0028] Figure 8 This is a schematic diagram illustrating the dynamic placement principle of edge servers based on deep reinforcement learning in an embodiment of the present invention;
[0029] Figure 9 This is a schematic diagram of the test hyperparameter settings in an embodiment of the present invention;
[0030] Figure 10 This is a schematic diagram of the performance test results in an embodiment of the present invention. Detailed Implementation
[0031] To make the technical means, creative features, objectives and effects of the present invention easy to understand, the following describes in detail the method and apparatus for placing a dynamic edge computing server based on deep reinforcement learning, in conjunction with embodiments and accompanying drawings.
[0032] <Example>
[0033] Figure 1 This is a flowchart illustrating the dynamic edge computing server placement method based on deep reinforcement learning in an embodiment of the present invention. Figure 2 This is a schematic diagram of the method flow of an embodiment of the present invention; Figure 3 This is a schematic diagram of a base station heat map in an embodiment of the present invention; Figure 4 This is a schematic diagram of the converted grayscale heatmap in an embodiment of the present invention; Figure 5 This is a schematic diagram of the waiting time of the base station in period t in an embodiment of the present invention;
[0034] like Figure 1 and 2 As shown, the dynamic edge computing server placement method based on deep reinforcement learning mainly includes the following steps:
[0035] Step S1: Obtain base station information in the real-world scenario, including the geographic location of the base station and the load status of the base station;
[0036] Step S2: Construct the environmental state O of the base station based on the geographic location and the load condition. t And construct a heat map of the base station;
[0037] In this embodiment, the environmental state is O. t It is a matrix composed of elements such as [number of base stations, base station location, base station load, number of edge servers, and edge server location]. Since the matrix cannot well reflect the spatial distance relationship between base stations, the correspondence between base stations and servers, and the dynamic changes in base station load, this embodiment uses the latitude and longitude coordinates of the base stations to reconstruct the distribution of base stations on a map, and then uses the base station load as a metric to draw a heat map of the base stations. The higher the base station load, the darker the color in the heat map.
[0038] Step S3: Based on the heatmap of the base station, and using the load status of the base station as a metric, determine the environmental state O. t Convert to a grayscale heatmap;
[0039] Typically, the input to a neural network in deep reinforcement learning is a tensor. For the edge server placement problem, this tensor consists of the number of base stations, the number of edge servers, base station locations, and base station load. On one hand, tensor inputs cannot intuitively represent the spatial distribution of base stations, the matching relationship between base station loads, and are not conducive to the neural network capturing features. On the other hand, the service demands of mobile users in a network can usually be obtained by collecting service information from communication base stations. Furthermore, the more mobile users and the more frequent the service requests in a certain area, the higher the load on the communication base stations in that area. This geographical and density relationship between base stations and mobile users can be well reflected by heatmaps. Figure 3 As shown, using base station (Shanghai Telecom) load as a metric, the density of users and services in different areas can be displayed with special highlighting. However, it is clear that... Figure 3 The background elements are complex, so if you directly use... Figure 3 Using environmental state as input would be highly detrimental to network training. Therefore, this embodiment proposes to extract useful information from the base station's heatmap and convert the environmental state into a grayscale heatmap. The specific process is as follows:
[0040] ① The location of communication base stations is determined by their latitude and longitude coordinates, reflecting their spatial distribution. ② The base station load is reverse-processed and mapped to a grayscale range of [0, 255], where 255 represents white and 0 represents black; the higher the base station load, the darker the color of the heatmap. ③ The mapped base station load is used as a metric to create a heatmap of the base stations, representing one aspect of the environmental state. ④ In addition to the grayscale heatmap of the base stations, the waiting time of the base stations in period t is also considered. Figure 5 As shown, the position of the edge server in period t is as follows: Figure 6 The image shown also represents the environment state. Using a similar method, these two states can also be represented as grayscale images. Therefore, the environment state is composed of three grayscale images. Furthermore, using grayscale images instead of color images as input can significantly reduce the computational cost of the neural network and improve the model's learning speed.
[0041] Step S4: Input the grayscale heatmap into the neural network for training of the deep reinforcement learning algorithm to obtain the trained agent model, which is an artificial intelligence model used to realize the dynamic placement of edge servers.
[0042] The core of deep reinforcement learning is training an agent to autonomously execute actions based on the current state. Common deep reinforcement learning agents include Deep Q-learning (DQN), Deep Deterministic Policy Gradient (DDPG), Asynchronous Advantage Actor-critic (A3C), and Proximal Policy Optimization (PPO). Since DQN is unsuitable for learning policies with high-dimensional continuous actions, the action space increases dramatically as the number of base stations and edge servers increases. Furthermore, DQN suffers from overestimation. Therefore, in this embodiment, an improved DQN, Dueling Double DQN (D3QN), is used as one of the agents. D3QN combines the ideas of Double DQN and Dueling DQN algorithms, further improving the algorithm's performance. Also, because the actions of dynamically placed edge servers are discrete, DDPG is unsuitable for learning discrete actions. Therefore, in this embodiment, the more efficient PPO agent is further employed.
[0043] Specifically, the agent training steps based on D3QN are as follows: ① Initialize the master Q-network and the target Q-network; ② Input the state into the network and select an edge server to place action a according to the ε-greedy policy. t ③ After performing this action, the state will transition to O. t To the next state O t+1 And receive an instant reward r t④ Set the next state to the current state O. t =O t+1 and will learn from experience (O t a t O t+1 r t ) Store the learned experience in the experience pool; ⑤ Once the experience pool is full, sample the learned experience from the experience pool according to the batch size and input it into the main Q network and the target Q network; ⑥ Calculate the loss function and perform gradient descent to update the parameters of the main Q network; ⑦ Copy the parameters of the main Q network to the target Q network every K steps to update the parameters of the target Q network.
[0044] The agent training steps based on PPO are as follows: ① Initialize policy parameters; ② Use the policy to interact with the environment to collect states, actions, and rewards, and calculate the advantage function value; ③ Continuously update the policy parameters to find the optimal policy parameters corresponding to the objective function.
[0045] Figure 7 This is a schematic diagram of the dynamic placement of edge servers in an embodiment of the present invention;
[0046] like Figure 7 As shown, edge servers are typically placed alongside existing network infrastructure such as communication base stations. This not only integrates computing and communication capabilities but also significantly reduces placement and maintenance costs. After the service provider initially deploys the edge servers (placing them at base stations 2, 3, and 4), the existing placement layout needs to be dynamically adjusted over time to reflect changes in the network environment, service demands, and service costs, maintaining the efficiency and reliability of edge computing services. Assuming the edge server layout update cycle is t, there are four scenarios for changes in the edge server placement layout during the t-th cycle: ① Addition: Add edge server 1 at base station 1; ② Removal: Remove the edge server at base station 2; ③ Migration: Migrate the edge server at base station 4 to base station 5; ④ No change: The edge server at base station 3 remains unchanged. The key issue in dynamic edge server placement is how to adjust the edge server placement layout promptly based on changes in system status—that is, achieving dynamic decision-making for edge server placement.
[0047] Since the dynamic placement of edge servers is essentially a continuous decision-making problem, the system needs to adjust the layout of edge servers based on the system state in each cycle. The dynamic placement decision should be adaptive and capable of long-term planning. Therefore, it is essential to train a model based on artificial intelligence methods to intelligently implement the dynamic placement of edge servers. Furthermore, when considering the placement of edge servers, the impact of cost and communication time must be taken into account, as follows:
[0048] Fixed Costs: Deploying edge servers requires renting space from telecom operators or landowners, and the rent is closely related to geographical location. Rent is typically higher in commercial centers and lower in suburbs. Therefore, it's necessary to assess the fixed costs of placing edge servers in different locations based on their geographical context.
[0049] Dynamic Costs: The dynamic placement of edge servers involves issues such as purchasing, installing, dismantling, and transporting them, incurring corresponding dynamic placement costs. Let S represent the layout results of edge servers for periods t and t-1, respectively. (t) and S (t-1) Therefore, the cost of dynamically adjusting the edge server within each placement cycle can be calculated as follows:
[0050]
[0051]
[0052]
[0053] In the formula, This indicates the cost of adding new edge servers. This indicates the cost savings after removing the edge server. This indicates the cost associated with migrating edge servers;
[0054] Time cost: When a mobile user accesses an edge server, they first access the base station wirelessly, and then access the edge server through the base station. The location of the edge server will affect the transmission latency for mobile users. This equals the sum of signal transmission delay and propagation delay. Furthermore, because the service capacity of edge servers is less than the total computing demand in the network, some base stations and users are allowed to be unserved for a certain period. The service of the entire network is then dynamically adjusted by the edge servers. Therefore, unserved base stations or users will experience waiting time. This time is equal to the time difference between two separate service visits.
[0055] Step S5: Input the grayscale heatmap as a state into the proxy model, and the proxy model is based on the environmental state O. t Autonomously determine and execute edge server placement actions a t This allows for the placement of edge servers and changes to the environmental state O. t For the next environmental state O t+1 At the same time, receive timely rewards. t ;
[0056] In this embodiment, based on the status input, the agent will execute action a according to the environmental status. t The process involves a state transition, with the following steps: ① Determine the number of edge servers for period t based on the existing computing demands in the network and complete their placement; ② Select a fixed number of edge servers to migrate their locations, changing the original edge server layout; ③ The edge servers complete their service to the base station, changing the current load on the base station; ④ The base station's load, latency, and edge server locations all change, and the environmental state changes from the current state O. t Transfer to the following state O t+1 ; and receive a reward r t The hybrid reward, defined as consisting of fixed costs, dynamic costs, and time costs, is expressed as follows:
[0057]
[0058] In the formula, F() represents a function that maps time to cost.
[0059] Step S6, based on the next environmental state O t+1 and the aforementioned timely reward r t Update the agent model and use the current agent model to guide the next placement action;
[0060] Step S7: Iterate and repeatedly update and optimize the agent model until the agent model converges to the predetermined optimization goal, that is, maximize the accumulated reward.
[0061] Figure 8 This is a schematic diagram illustrating the dynamic placement principle of edge servers based on deep reinforcement learning in an embodiment of the present invention;
[0062] In this embodiment, as Figure 8 As shown, the edge server dynamic placement method based on deep reinforcement learning consists of two parts: an agent and an environment. The main idea is that the agent learns by acquiring examples, autonomously determining and executing edge server placement actions based on the environment state. After execution, the edge server provides edge computing services for the base station's computing tasks, thereby changing the environment state to the next state. The agent then receives a reward from the system for executing the action; the reward level reflects the quality of the action. The agent updates its model after acquiring examples, using the current model to guide the next action. This iterative process is repeated until the model converges. The overall goal of model optimization is to maximize the cumulative reward.
[0063] Figure 9 This is a schematic diagram of the test hyperparameter settings in an embodiment of the present invention; Figure 10This is a schematic diagram of the performance test results in an embodiment of the present invention.
[0064] This embodiment also utilizes the trained model and agent to compare with K-mediods, Top-K, and Random algorithms widely used in edge server placement research, testing performance hyperparameter settings such as... Figure 9 As shown, the test results are as follows: Figure 10 As shown.
[0065] Functions and effects of the embodiments
[0066] The dynamic edge computing server placement method based on deep reinforcement learning according to the present invention, considering the geographical and temporal factors of business in real edge computing environments, achieves adaptive and intelligent placement of edge servers based on deep reinforcement learning, filling the gap in research on the intelligent implementation of dynamic edge server placement and effectively guiding the dynamic placement of edge servers in real-world scenarios. It provides an effective solution for transforming sparse reinforcement learning problems into computer vision problems. The present invention converts environmental states into grayscale heatmap inputs instead of directly inputting color images, greatly reducing the computational load of the neural network, improving the model's learning speed, and facilitating neural network training. According to test results, it is evident that the embodiments of the present invention have a significant performance improvement compared to other classic algorithms. Furthermore, the present invention can be directly applied to dynamic combinatorial optimization (such as the traveling salesman problem), dynamic infrastructure site selection (such as dynamic transportation station and factory site selection), and other related problems, possessing good applicability, practicality, and engineering feasibility.
[0067] The above embodiments are only used to illustrate specific implementations of the present invention, and the present invention is not limited to the scope of the description of the above embodiments.
Claims
1. A dynamic edge computing server placement method based on deep reinforcement learning, characterized in that, Includes the following steps: Step S1: Obtain base station information in the real-world scenario, including the geographic location of the base station and the load status of the base station; Step S2, constructing an environmental state of the base station based on the geospatial position and the load condition and constructing a heat map of the base station; Step S3, based on the heat map of the base station, taking the load condition of the base station as a criterion, converting the environment state into a gray-scale heat map; Step S4: The grayscale heatmap is input into the neural network for training with a deep reinforcement learning algorithm to obtain a trained proxy model. This proxy model is an artificial intelligence model used to dynamically place edge servers. The dynamic placement of the edge server should take into account relevant placement indicators, including fixed cost, dynamic cost, and communication time cost. The fixed cost refers to the cost expenditure when the edge server is placed at different locations during dynamic placement , The dynamic cost includes the cost incurred due to changes in the edge servers during dynamic placement. , , The calculation expression is as follows: , , , In the formula, This indicates the cost of adding new edge servers. This indicates the cost savings after removing the edge server. This indicates the cost of migrating edge servers. and Indicates the placement period. and This represents the set of layout results for edge servers within the placement period. The communication time cost includes the transmission latency caused by the location of the edge server affecting mobile user access. And the waiting time for the base station or user being served. Wherein, the transmission delay is equal to the sum of the signal transmission delay and the propagation delay, and the waiting time... This refers to the time difference between two consecutive service receptions. Step S5: Input the grayscale heatmap as a state into the proxy model, and the proxy model is based on the environmental state. Autonomously determine and execute edge server placement actions This allows for the placement of edge servers and changes to the state of the environment. For the next environmental state At the same time, receive timely rewards. , The timely reward The hybrid reward, which is a combination of the fixed cost, the dynamic cost and the communication time cost, is expressed as follows: , In the formula, is expressed as a function mapping time to cost; Step S6, updating the agent model based on the next environment state and the timely reward updating the agent model, and guiding the next placement action by using the current agent model Step S7: Iterate and repeatedly update and optimize the agent model until the agent model converges to the predetermined optimization goal, that is, maximize the accumulated reward.
2. The deep reinforcement learning based dynamic edge computing server placement method of claim 1, Its features are: Step S2 includes the following sub-steps: Step S2-1: Locate the base station using its latitude and longitude coordinates to reflect its spatial distribution. Step S2-2: Reverse the load of the base station and map it to... The grayscale range is defined as follows: 255 represents white and 0 represents black. This means that the greater the load on the base station, the darker the color of the heatmap. 3.The deep reinforcement learning based dynamic edge computing server placement method of claim 1, Its features are: In step S4, the training of the deep reinforcement learning agent uses D3QN-based agent training, and the training steps are as follows: Step A1: Initialize the master Q-network and the target Q-network; Step A2, the state is input to the network and the action is selected according to A greedy policy selects an edge server placement action ; Step A3, performing the action Later the state Transition to next state And get immediate reward ; Step A4: Set the next state to the current state. and will learn from experience Store in the experience pool; Step A5: After the experience pool is full, sample learning experiences from the experience pool according to batch size and input them into the main Q network and the target Q network; Step A6: Calculate the loss function and perform gradient descent to update the parameters of the main Q-network; Step A7: After every K steps, copy the parameters of the master Q network to the target Q network to update the parameters of the target Q network.
4. The deep reinforcement learning based dynamic edge computing server placement method of claim 1, Its features are: In step S4, the training of the deep reinforcement learning agent further employs agent training based on the PPO algorithm. The training steps are as follows: Step B1: Initialize the strategy parameters; Step B2: Use a strategy to interact with the environment to collect the state, the action, and the reward, and calculate the advantage function value; Step B3: Continuously update the policy parameters to find the optimal policy parameters corresponding to the objective function.
5. The deep reinforcement learning based dynamic edge computing server placement method of claim 1, Its features are: Step S5 includes the following sub-steps: Step S5-1: Based on the existing computing needs in the network, determine the number of edge servers for period t and complete the placement of the servers; Step S5-2: Select a fixed number of the edge servers to undergo location migration, changing the original layout of the edge servers; Step S5-3: The edge server completes the service to the base station and changes the current load status of the base station; Step S5-4: Based on the load status and location changes of the base station, the environmental state of the base station is changed.
6. The method for placing dynamic edge computing servers based on deep reinforcement learning according to claim 1, characterized in that: wherein, The trained agent model was compared with the K-mediods, Top-K, and Random algorithms in the edge server placement research to test its performance.
Citation Information
Patent Citations
Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN113392935A
Communication method and communication device based on artificial intelligence
CN114071484A