End-to-end deterministic slice orchestration method and device based on deep reinforcement learning
By employing an end-to-end deterministic slicing orchestration method based on deep reinforcement learning, network resource allocation is dynamically optimized, addressing the shortcomings of traditional static resource allocation methods in the power industry. This achieves high reliability, low latency, and deterministic transmission of power services, thereby improving resource utilization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- STATE GRID FUJIAN POWER ELECTRIC CO ECONOMIC RESEARCH INSTITUTE
- Filing Date
- 2024-11-27
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional static resource allocation methods are ill-suited to the dynamic network environment and complex business needs in the power industry, leading to resource waste or failure to meet the high reliability, low latency, and deterministic requirements of power services in a timely manner.
An end-to-end deterministic slice orchestration method based on deep reinforcement learning is adopted. By mapping physical network nodes through virtualization technology, a resource allocation optimization model for information freshness, reliability, latency, and throughput is constructed. The optimization solution is obtained by using deep reinforcement learning algorithm and Markov decision process model, and the network slice resource allocation is dynamically adjusted.
To ensure that the deterministic needs of power services are met, reduce communication latency, improve reliability, and maintain high service satisfaction under different network conditions.
Smart Images

Figure CN119789113B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of communication technology, and in particular to an end-to-end deterministic slice orchestration method and apparatus based on deep reinforcement learning. Background Technology
[0002] In the power industry, the introduction of 5G private networks can effectively support the communication needs of various business scenarios in smart grids, such as remote power equipment monitoring, smart meter data collection, load dispatching, fault detection and early warning, etc. To ensure the power industry's requirements for high reliability, low latency and deterministic service in data transmission, 5G power virtual private networks are widely used, providing end-to-end customized communication services for different businesses through network slicing technology.
[0003] Network slicing, a core technology of 5G networks, virtualizes physical network resources into multiple independent logical network slices to meet the Quality of Service (QoS) requirements of different services. However, in the power industry, due to the critical nature of power services and their high demands for communication quality, especially for ultra-reliable low-latency communication (URLLC), more precise network resource management and slicing orchestration algorithms are needed to ensure deterministic end-to-end communication. Traditional static resource allocation methods struggle to cope with dynamic network environments and complex service requirements, leading to resource waste or failure to meet service demands in a timely manner. Summary of the Invention
[0004] The technical problem to be solved by the present invention is to provide an end-to-end deterministic slicing orchestration method and apparatus based on deep reinforcement learning, which optimizes network resource allocation and meets the deterministic requirements of power services for information freshness, reliability, latency and throughput.
[0005] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows:
[0006] An end-to-end deterministic slice orchestration method based on deep reinforcement learning includes:
[0007] By mapping physical network nodes to virtual networks using virtualization technology, an end-to-end network slicing virtualization model is obtained.
[0008] Obtain resource data of the target network slice, and calculate the information freshness index, reliability index, latency index and throughput index of the target network slice based on the resource data;
[0009] A resource allocation optimization model is constructed based on the information freshness index, reliability index, latency index, and throughput index of the target network slice;
[0010] The resource allocation optimization model is optimized and solved using deep reinforcement learning algorithms and Markov decision process models to obtain the resource allocation scheme for the target network slice.
[0011] Resources are allocated to different services according to the resource allocation scheme.
[0012] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is as follows:
[0013] An end-to-end deterministic slice orchestration apparatus based on deep reinforcement learning includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the end-to-end deterministic slice orchestration method based on deep reinforcement learning described above.
[0014] The beneficial effects of this invention are as follows: by constructing a resource allocation optimization model that includes information freshness index, reliability index, latency index, and throughput index, the deterministic requirements of power services are ensured to be met; at the same time, the introduction of deep reinforcement learning algorithms, compared with traditional fixed resource allocation strategies, can continuously adjust the resource allocation of network slices to maintain a high level of service satisfaction under different network conditions, achieve deterministic transmission of power services, reduce communication latency, and improve reliability. Attached Figure Description
[0015] Figure 1 This is a flowchart illustrating the steps of an end-to-end deterministic slice orchestration method based on deep reinforcement learning in an embodiment of the present invention.
[0016] Figure 2 This is a schematic diagram of the end-to-end network slicing architecture of a 5G power virtual private network based on a deep reinforcement learning-based end-to-end deterministic slicing orchestration method in an embodiment of the present invention.
[0017] Figure 3 This is a schematic diagram of a deterministic business resource allocation method based on deep reinforcement learning, which is an end-to-end deterministic slice orchestration method based on deep reinforcement learning, according to an embodiment of the present invention.
[0018] Figure 4 This is a schematic diagram of an end-to-end deterministic slice orchestration device based on deep reinforcement learning in an embodiment of the present invention. Detailed Implementation
[0019] To explain in detail the technical content, objectives, and effects of the present invention, the following description is provided in conjunction with the embodiments and accompanying drawings.
[0020] An end-to-end deterministic slice orchestration method based on deep reinforcement learning includes:
[0021] By mapping physical network nodes to virtual networks using virtualization technology, an end-to-end network slicing virtualization model is obtained.
[0022] Obtain resource data of the target network slice, and calculate the information freshness index, reliability index, latency index and throughput index of the target network slice based on the resource data;
[0023] A resource allocation optimization model is constructed based on the information freshness index, reliability index, latency index, and throughput index of the target network slice;
[0024] The resource allocation optimization model is optimized and solved using deep reinforcement learning algorithms and Markov decision process models to obtain the resource allocation scheme for the target network slice.
[0025] Resources are allocated to different services according to the resource allocation scheme.
[0026] As can be seen from the above description, the beneficial effects of the present invention are as follows: by constructing a resource allocation optimization model that includes information freshness index, reliability index, latency index, and throughput index, the deterministic requirements of power services are ensured to be met; at the same time, the introduction of deep reinforcement learning algorithms, compared with traditional fixed resource allocation strategies, can continuously adjust the resource allocation of network slices, maintain a high level of service satisfaction under different network conditions, achieve deterministic transmission of power services, reduce communication latency, and improve reliability.
[0027] Furthermore, the calculation of the information freshness index, reliability index, latency index, and throughput index of the target network slice based on the resource data includes:
[0028] Obtain the parameter value of the target parameter to be calculated, and the preset comparison value corresponding to the target parameter;
[0029] The target parameters include information freshness, reliability, latency, and throughput;
[0030] The evaluation value of the target parameter is obtained by comparing the parameter value with the preset comparison value;
[0031] The evaluation values of all target parameters under the target network slice are obtained and the average value is calculated to obtain the target index corresponding to the target parameter.
[0032] As described above, by obtaining the parameter values of information freshness, reliability, latency, and throughput, the parameter values are compared with preset comparison values to determine whether the information freshness and other parameter values of the service meet the transmission requirements. The average value of the evaluation values of all target parameters under the target network slice is used as the corresponding indicator to evaluate each indicator.
[0033] Furthermore, calculating the information freshness index of the target network slice includes:
[0034]
[0035] Among them, a m (t+1) represents the information freshness of business m at time t+1; a binary variable. This indicates that the power virtual private network device successfully transmitted data in time slot t. If so, then... otherwise, d(t) represents the end-to-end delay of data transmission within time slot t of the power virtual private network equipment; a0 is a fixed constant preset by the system; a(t) represents the average information freshness; f A (t) represents the information freshness index evaluation function, A m M represents the preset timeliness index of transmission service m; t E represents the total number of services in time slot t. A (t) represents the information freshness index.
[0036] As can be seen from the above description, the above function can describe the freshness of information at different times. By setting a fixed constant a0, the freshness of information will not remain at the same value for a long time when no new data arrives, thus better reflecting the timeliness of information.
[0037] Furthermore, calculating the reliability index of the target network slice includes:
[0038]
[0039] p soft,u ∈(0,1);
[0040] p phy,i ∈(0,1);
[0041]
[0042]
[0043] Among them, R m (t) represents the reliability of service m at time t; u is a virtual node; Represents the set of nodes for business m; a binary variable. This indicates whether the underlying service node i used for transmitting service m is mapped to the virtual node u. If so, then... otherwise p soft,u ∈(0,1) represents the probability that software node u works normally; p phy,i ∈(0,1) represents the probability that physical node i is working normally; f R (t) represents the reliability index evaluation function, R m M represents the preset reliability index of the transmission service m; t E represents the total number of services in time slot t. R (t) represents the reliability index.
[0044] As described above, by considering the probability of virtual nodes working normally and the probability of physical nodes working normally, and by adjusting the probability according to time changes, the reliability model can more accurately reflect the dynamic reliability of nodes or links.
[0045] Furthermore, calculating the latency metric of the target network slice includes:
[0046]
[0047] Where d(t) represents the end-to-end delay of data transmission within time slot t of the power virtual private network device; Indicates access network latency, x m,k This represents the connection relationship between service terminal m and base station k, when x m,k =1 indicates a connection, D m Indicates the data packet size, r m,k This represents the upload data rate of service m at base station k; Indicates core network latency. Let f(D) represent the set of nodes for business m, where n represents the number of nodes. m ) indicates that the size of the transmitted data is D. m The required computing resources, C m,n (t) represents the amount of computing resources allocated to the transmission service m on node n. This represents the corresponding set of links, where l represents a link and d represents a link. m,l B represents the distance of link l. m,l (t) represents the amount of bandwidth resources allocated to link l; f D (t) represents the delay index evaluation function, and D represents the fixed delay; M t E represents the total number of services in time slot t. D (t) represents the time delay index.
[0048] As described above, the latency calculation process takes into account both access network latency and core network latency. Access network latency is described by the size of data packets and the relationship between the service terminal and the base station, while core network latency is described by data such as the amount of computing resources required to transmit data, the amount of computing resources allocated to the transmission service, and the link distance. This allows for a more accurate reflection of the service latency.
[0049] Furthermore, calculating the throughput metric of the target network slice includes:
[0050]
[0051] Among them, f T Let r(t) represent the throughput metric evaluation function, and r(t) represent the actual throughput of the deterministic service. E represents the minimum uplink data transmission rate for a slice, i.e., the minimum throughput required for transmitting service m. T (t) represents the throughput metric, M t This represents the total number of services in time slot t.
[0052] As described above, by obtaining the actual throughput of a deterministic service and comparing it with the minimum throughput required for the transmission service, it is possible to effectively determine whether the actual throughput meets the service transmission requirements.
[0053] Furthermore, the step of constructing a resource allocation optimization model based on the information freshness index, reliability index, latency index, and throughput index of the target network slice includes:
[0054] Q m =β1E D (t)+β2E T (t)+β3E A (t)+β4E R (t);
[0055] Among them, Q m E represents the performance of a single network slice m; D (t) represents the time delay index; E T (t) represents the throughput metric, E A (t) represents the information freshness index, E R (t) represents the reliability index; β1-β4 represent the weight values corresponding to the index.
[0056] As described above, by setting corresponding weight values for different indicators, the weight values of each indicator can be adjusted according to actual business needs, thereby better evaluating the performance of the target network slice.
[0057] Furthermore, the optimization solution of the resource allocation optimization model using deep reinforcement learning algorithms and Markov decision process models includes:
[0058] The resource allocation optimization model is modeled as a Markov decision process, which includes a reward function.
[0059] The resource allocation result is obtained by the deep reinforcement learning algorithm based on the current state of the target network slice and historical experience.
[0060] Determine whether the resource allocation result meets the business requirements; if not, apply a negative penalty in the reward function.
[0061] The deep reinforcement learning algorithm iterates repeatedly, and in each iteration, the resource allocation result is adjusted according to the reward function until the business requirements are met.
[0062] As described above, by designing a reward mechanism, deep reinforcement learning algorithms can continuously improve their strategies to maximize the performance evaluation of the target network slice.
[0063] Furthermore, the deep reinforcement learning algorithm includes a model-free deep reinforcement learning algorithm based on maximum entropy.
[0064] As described above, the model-free deep reinforcement learning algorithm based on maximum entropy is used to solve the problem. The optimal strategy of this algorithm is to maximize its entropy regularization reward, which makes the algorithm more stable.
[0065] Another embodiment of the present invention provides an end-to-end deterministic slice orchestration apparatus based on deep reinforcement learning, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps of the end-to-end deterministic slice orchestration method based on deep reinforcement learning described above.
[0066] The end-to-end deterministic slicing orchestration method and apparatus based on deep reinforcement learning provided by this invention can be applied to network resource allocation scenarios in power grids. By dynamically optimizing network resource allocation, it meets the deterministic service quality requirements of power services for latency, reliability, and other aspects. The following is a detailed description of the implementation methods:
[0067] Example 1
[0068] Please refer to Figure 1 An end-to-end deterministic slice orchestration method based on deep reinforcement learning includes:
[0069] S1. By mapping physical network nodes to a virtual network using virtualization technology, an end-to-end network slice virtualization model is obtained. For example, taking 5G network slice deployment as an example, different service scenarios have different resource requirements for the underlying physical network, and the deployment structure of network slices also differs. Specifically:
[0070] Please refer to Figure 2 Different service types of network slices are deployed on the underlying physical network composed of access network, transmission network, and core network, and the slices are interconnected and cooperate with each other. The architecture of end-to-end deterministic network slicing mainly includes a physical sensing layer, a 5G power private network slicing layer, and an application service layer, as well as the interfaces between these layers. The physical sensing layer includes power service terminals, base stations, transmission equipment (such as routers and switches), and servers, and is a crucial foundation for achieving deterministic transmission in 5G power virtual private network slicing. The 5G power private network slicing layer is responsible for the virtual mapping of physical network nodes and links, and for network resource scheduling and management. The application service layer includes various applications such as slice service management and control.
[0071] Network slicing includes access network slicing and core network slicing. Access network slicing is used to handle the transmission from service terminals to base stations, while core network slicing is used to manage the transmission links and computing resource allocation for services. The underlying physical network and the mapped virtual network span the access network domain and the core network domain. When a network slice is requested for deployment, each node corresponds one-to-one with a node on the underlying physical network, and each link corresponds to the physical link connecting the node's location.
[0072] Each node has a computational resource value attribute, and each communication link has a bandwidth resource value attribute. The physical infrastructure in a 5G power virtual private network is represented as an undirected graph G = (G... N G L ), where G N = {1,2,…,n,…,N} represents the set of nodes, G L ={1,2,…,l,…,L} represents the set of links. The virtual undirected graph in a network slice is represented as... in This represents the set of nodes in a virtual slice network. The set of links in a virtual slice network is represented by C = {C1, C2, ..., C}. n ,…,C N Physical nodes are linked via wired links, and the bandwidth resources on these links are represented as B = {B1, B2, ..., B}. l ,…B L}
[0073] For example, an access network consists of k different cellular cells, denoted as set K = {1, 2, ..., k}. Each cell consists of a base station located at the cell center and m randomly distributed power service terminals, denoted as set M = {1, 2, ..., m}. Whether service terminal m is connected to base station k is represented by a binary variable x. m,k This means that when x m,k When = 1, service terminal m accesses cell k. Assuming that each service terminal m can only request one service at a time, then terminal m can only be connected to one base station, as expressed as:
[0074]
[0075] In the access network, the total bandwidth B Hz is divided into a group of Physical Resource Blocks (PRBs), and each resource block R in the cell within time slot t... RB ={1,2,…,r} can only be dispatched to one power service terminal m, using binary variables. Indicates; when When the power service terminal m uses the internal spectrum resource block r of the cell, the number of RB resources required by service m is:
[0076]
[0077] The RB resources allocated to all services on base station k cannot exceed the maximum RB limit of time-frequency resources.
[0078]
[0079] The upload data rate of power service m at base station k within time slot t is expressed as:
[0080]
[0081] Where p represents the transmit power and g represents the channel gain. Let ω represent the path loss from power terminal m to base station k, and σ represent the path loss coefficient. 2 This represents the power of additive white Gaussian noise.
[0082] To ensure the QoS of the service, the uplink data rate should meet the minimum uplink transmission data rate of the slice:
[0083]
[0084] in, This represents the minimum uplink data rate required for service m; different services have different minimum uplink data rates, and the network slice controller obtains the value in advance through the service intent input.
[0085] In the core network, define a binary variable. This indicates whether the underlying service node i used for transmitting service m is mapped to the virtual node u. If so, then... otherwise Using binary variables This indicates whether the physical link (i,j) is mapped to the virtual link (u,v). If so, then... otherwise The total available computing resources on each node are C. max The virtual node u satisfies the computational resource constraints. Where C m,u (t) represents the computing resources allocated for service m on virtual node u in time slot t, where the set of computing resources on the node is C = {C1, C2, ..., C}. n ,…,C N} represents the set of computing resources across N physical nodes. The maximum available bandwidth resource on each link is B. max To meet link bandwidth resource constraints Among them, B m,(u,v) (t) represents the bandwidth resources allocated for service m on the virtual link (u, v) in time slot t, where the bandwidth resources on the link are B = {B1, B2, ..., B}. l ,…B L Let} represent the set of bandwidth resources on L physical links; let the tuple (C t B t Let represent the resource allocation set within time slot t, where represents the computing resources and bandwidth resources allocated to the network slice within time slot t, respectively. Different types of power services have different requirements for QoS indicators; deterministic transmission needs to ensure that the total latency of the network slice is within a given range.
[0086] S2. Obtain resource data for the target network slice. Based on the resource data, calculate the Age of Information (AoI), reliability, latency, and throughput metrics for the target network slice. Specifically, based on the service requirements of the power industry, propose a resource allocation optimization model that includes information freshness and reliability requirements, aiming to maximize the long-term average satisfaction of the network slice. After obtaining the parameter values of the target parameters to be calculated and the corresponding preset comparison values, obtain the evaluation value of the target parameter based on the ratio of the parameter value to the preset comparison value. Then, obtain the evaluation values of all target parameters under the target network slice and calculate the average value to obtain the target metrics corresponding to the target parameters.
[0087] (1) AoI refers to the time difference between the current moment and the last received message. Specifically, for a given information system, the smaller the AoI value, the newer the information and the more timely the transmission; conversely, the larger the AoI value, the older the information and the worse the timeliness. Information freshness is used to measure the real-time performance of power business data. By monitoring AoI, network resources can be dynamically adjusted to improve the timeliness of business transmission. AoI can serve as a key indicator for measuring network QoS. By evaluating AoI, the allocation of network resources can be optimized, and the overall service quality can be improved. Using AoI to measure information freshness, the actual AoI for deterministic services is defined as:
[0088]
[0089] Among them, a m (t+1) represents the information freshness of business m at time t+1; a binary variable. This indicates that the power virtual private network device successfully transmitted data in time slot t. If so, then... otherwise, d(t) represents the end-to-end delay of data transmission within time slot t of the power virtual private network device. If the power virtual private network device successfully receives data during time slot t, AoI is reset to d(t); otherwise, AoI is increased by a0. a0 is a fixed constant preset by the system.
[0090] The average AoI for power deterministic services m is expressed as:
[0091]
[0092] For services in 5G power virtual private networks that involve a large number of control commands and have high timeliness requirements, the AoI (Aspect-Oriented Intelligence) index must not exceed a specified threshold; otherwise, delays in control commands can lead to unpredictable consequences. Therefore, AoI is a crucial indicator for deterministic services. For deterministic service processing, when the information age value exceeds the service's pre-requirement value, the network transmission of the service is considered invalid. The service information age index evaluation function is defined as follows:
[0093]
[0094] a(t) represents the timeliness of a deterministic service, A m Let m represent the preset timeliness index of the transmission service. Based on this, an information age evaluation function is defined to describe the satisfaction of deterministic services with transmission:
[0095]
[0096] M t E represents the total number of services in time slot t. A (t) represents the information freshness index.
[0097] (2) Reliability is crucial for consistent performance and uninterrupted service, and is a key aspect of ensuring the determinism of the entire network. Service nodes may experience hardware or software failures (e.g., unexpected physical node restarts / shutdowns, network disconnections, software errors, etc.) due to software and hardware damage or external environmental factors, leading to interruptions and failures in power grid service transmission tasks. Reliability models assess the reliability of power service transmission links by calculating the working probabilities of virtual and physical nodes to ensure service continuity during network interruptions or failures. Software reliability is defined as the probability that a VNF successfully completes its expected service processing. Software reliability depends on the complexity of the VNF, configuration errors, and software errors. The probability of each VNF node u working normally is: p soft,u ∈(0,1); Hardware reliability is related to the device quality of the physical node carrying the VNF, and the probability that each physical node i is working normally is: p phy,i ∈(0,1);
[0098] Power service m transmits data through a series of virtual links connected by VNFs. The probability of data transmission failure and server node failure for each service is defined. The reliability of service m can be calculated as the product of the reliability of multiple nodes in the link. A reliability model is established as follows:
[0099]
[0100] Among them, R m (t) represents the reliability of service m at time t; since the state of VNF or physical node may change, such as failure or repair, the reliability model will adjust the probability according to the time change to reflect the dynamic reliability of node or link; u is a virtual node; The set of nodes represents service m; the characteristic function of the service transmission reliability index is defined as:
[0101]
[0102] Where R(t) represents the reliability of deterministic services, R m This represents the preset reliability index of the transmission service m; based on this, the reliability evaluation function is defined as:
[0103]
[0104] f R (t) represents the reliability index evaluation function, E R (t) represents the reliability index.
[0105] (3) The end-to-end latency of a 5G power virtual private network is the latency of all network links and devices traversed from the time a service request is sent by the requesting terminal until it is processed by the server. For end-to-end network slicing requests, the user's latency includes access network latency and core network latency; therefore, the latency is defined as:
[0106]
[0107] The time delay from power equipment terminal m to base station k is the access network delay, expressed as:
[0108]
[0109] Core network latency mainly consists of computation latency and link latency. Computation latency refers to the time from when service data is received and processed at the processing node to when it is forwarded onto the link. Link latency refers to the time spent transmitting data on the link. Therefore, core network latency can be expressed as:
[0110]
[0111] Where d(t) represents the end-to-end delay of data transmission within time slot t of the power virtual private network device; Indicates access network latency, x m,k This represents the connection relationship between service terminal m and base station k, when x m,k =1 indicates a connection, D m Indicates the data packet size (in bits), r m,k This represents the upload data rate of service m at base station k; Indicates core network latency. Let f(D) represent the set of nodes for business m, where n represents the number of nodes. m ) indicates that the size of the transmitted data is D. m The required computing resources, C m,n (t) represents the amount of computing resources allocated to the transmission service m on node n. This represents the corresponding set of links, where l represents a link and d represents a link. m,l B represents the distance of link l. m,l (t) represents the amount of bandwidth resources allocated to link l;
[0112] Slice resource scheduling is guided by the mapping relationship between performance indicators and resources, and the characteristic function of transmission delay is defined as:
[0113]
[0114] The end-to-end latency evaluation function for a service within time t is defined as follows:
[0115]
[0116] f D (t) represents the delay index evaluation function, and D represents the fixed delay; E D (t) represents the time delay index.
[0117] (4) The characteristic function of the service transmission throughput index is defined as:
[0118]
[0119] Among them, f T Let r(t) represent the throughput metric evaluation function, and r(t) represent the actual throughput of the deterministic service. The minimum uplink data transmission rate for a slice is defined as the minimum throughput required for transmission service m. Based on this, the transmission throughput evaluation function for time slot t is defined as follows:
[0120]
[0121] E T (t) represents the throughput metric.
[0122] S3. Construct a resource allocation optimization model based on the information freshness, reliability, latency, and throughput metrics of the target network slice. Specifically:
[0123] Each end-to-end network slice considers four key metrics: latency (D), throughput (r), timeliness (A), and reliability (R); the value range of each element in service satisfaction is {E}. D (t),E T (t),E A (t),E R (t)}∈[0,1], the closer it is to 1, the higher the service satisfaction; based on the above analysis, in order to measure the QoS of the system, the network slicing performance evaluation function Q is used. m Q represents an evaluation criterion for a network to complete transmission according to specific deterministic service indicators within a given transmission period; m This only represents the performance of a single network slice m:
[0124] Q m =β1E D (t)+β2E T (t)+β3E A (t)+β4E R (t);
[0125] β1-β4 represent the weight values corresponding to the indicators, respectively;
[0126] According to Q mThe transmission status of deterministic services is determined to guide link resource scheduling; the optimization objective is to maximize the performance evaluation function of network slices, and the optimization model is shown in the equation:
[0127]
[0128] C6:β1+β2+β3+β4=1.
[0129] S4. The resource allocation optimization model is optimized using deep reinforcement learning algorithms and Markov decision process models to obtain the resource allocation scheme for the target network slice. Specifically:
[0130] S41. Model the resource allocation optimization model as a Markov Decision Process (MDP), including a set of states, a set of actions, a reward function, and a state transition matrix; for example, an MDP consists of quintuples. Composition, in which Represents a finite set of states. Represents a finite set of actions. Here is the state transition matrix. Let γ be the reward function, and γ represent the discount factor. The end-to-end network slice controller, acting as an agent, observes the current state, and the set of states observed in time slot t is: S(t)={ω s (t),C s (t),B s ω(t),s(t)};where, ω s (t) represents the available RB resources observed by the base station at time t, C s (t),B s (t) represents the remaining computing resources of node t in time slot t and the remaining bandwidth resources of the link, respectively; The maximum latency requirement, maximum data freshness requirement, minimum data rate, and minimum reliability requirement for reaching the service are determined. The slice controller selects actions based on the observed state; the action set is as follows: Where ω(t), C(t), and B(t) represent the RB resources, computing resources, and bandwidth resources allocated by the node and link to each service, respectively.
[0131] S42. Resource allocation results are obtained based on the current state of the target network slice and historical experience through deep reinforcement learning algorithms. For example, the network slice controller selects the optimal resource allocation scheme based on the current state and historical experience. The selected action determines the allocation of computing resources and bandwidth resources. The ultimate goal of the network slice controller is to maximize the performance evaluation function under the constraints of the time-slot system. After taking a resource allocation action, the network slice controller can immediately obtain a reward. By designing a reward mechanism, the agent can continuously improve its strategy to maximize the performance evaluation function. The solution is only effective when all constraints are met; otherwise, the optimization problem needs to be solved again to obtain a new solution.
[0132] S43. Determine whether the resource allocation result meets the business requirements. If not, apply a negative penalty in the reward function. That is, introduce a penalty mechanism into the reward function. If the network slice controller's actions do not meet the QoS requirements of the business, apply a negative penalty in the reward function to guide the network slice controller to make better resource allocation.
[0133] Define the reward as:
[0134]
[0135] The penalty term is a negative constant; if the action taken does not meet the constraints, a penalty is applied in the reward.
[0136] S44. The deep reinforcement learning algorithm iterates repeatedly, adjusting resource allocation based on the reward function in each iteration until business requirements are met. That is, the network slice controller gradually improves its strategy through repeated iterations. In each iteration, the network slice controller adjusts its strategy based on feedback to maximize cumulative rewards and ensure efficient resource utilization. Furthermore, employing a multi-agent system from deep reinforcement learning allows multiple agents to work collaboratively, optimizing network resource utilization efficiency and adaptively adjusting to changes in network load, ensuring the normal operation of critical services.
[0137] Please refer to Figure 3 Based on the end-to-end deterministic network service slicing architecture of 5G power virtual private networks, this paper proposes a deterministic service resource allocation method based on Flexible Actor-Commentator (SAC). SAC is an offline policy algorithm based on a continuous action space and oriented towards a maximum entropy RL framework. The optimal policy of this method is to maximize its entropy regularized reward, making the algorithm more stable. The objective function of the SAC algorithm, in addition to learning a policy to maximize the expected cumulative reward, also requires maximizing the action entropy of each policy output.
[0138]
[0139] To automatically update the temperature coefficient α, the optimized loss function is:
[0140]
[0141] The SAC value function and policy are each fitted by a parameter-controlled neural network. The actor network is updated by minimizing the KL divergence. Two Q-networks are used, and the smaller Q-value is selected as the target Q-value each time to ensure fast and stable training. The critic network is updated by minimizing the Bellman error. The end-to-end slice deterministic business resource allocation algorithm specifically includes the following steps:
[0142] 1. Initialize the slice controller training environment of the system and configure the SAC network model in deep reinforcement learning. This network model includes a main network and a target network, and sets the experience replay pool D;
[0143] 2. Start the iterative optimization process, set the number of iterations M, initialize the environment state of the slice controller, and start observing the current network state. Execute step S3.
[0144] 3. At each time step t, the slice controller selects an action using a greedy strategy, executes the resource allocation action, calculates the feedback reward, and enters the next state;
[0145] 4. Experience (s) t ,a t ,r t ,s t+1 The data is stored in the experience replay pool D, and experience data is randomly sampled from the experience replay pool to train the policy network.
[0146] 5. Calculate the time difference objective The loss function is minimized using gradient descent, and then the parameters of the SAC master network are updated. The parameters of the SAC master network are then copied to the parameters of the target network.
[0147] S5. Allocate resources to different services according to the resource allocation plan.
[0148] Example 2
[0149] Please refer to Figure 4 An end-to-end deterministic slice orchestration apparatus based on deep reinforcement learning includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps in the end-to-end deterministic slice orchestration method based on deep reinforcement learning as described in Embodiment 1.
[0150] In summary, the end-to-end deterministic slicing orchestration method and apparatus based on deep reinforcement learning provided by this invention ensures that the deterministic requirements of power services are met by introducing an optimization model that includes key indicators such as information freshness and reliability. This invention proposes an end-to-end deterministic slicing orchestration method based on deep reinforcement learning for 5G power virtual private networks. Compared with traditional fixed resource allocation strategies, this method can maintain high service satisfaction under different network conditions through dynamic resource allocation and intelligent scheduling, ensuring a stable and efficient service experience for power services in 5G virtual private networks, guaranteeing the stable operation of deterministic power network services, and improving resource utilization.
[0151] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent modifications made based on the content of the present invention specification and drawings, or direct or indirect applications in related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. A deep reinforcement learning based end-to-end deterministic slice orchestration method, characterized in that, include: By mapping physical network nodes to virtual networks using virtualization technology, an end-to-end network slicing virtualization model is obtained. Obtain resource data of the target network slice, and calculate the information freshness index, reliability index, latency index and throughput index of the target network slice based on the resource data, wherein the resource data includes computing resources, RB resources and reliability; A resource allocation optimization model is constructed based on the information freshness index, reliability index, latency index, and throughput index of the target network slice; The resource allocation optimization model is optimized and solved by deep reinforcement learning algorithm and Markov decision process model to obtain the resource allocation scheme of the target network slice; Allocate resources to different services according to the resource allocation scheme; The resource allocation optimization model is constructed based on the information freshness, reliability, latency, and throughput metrics of the target network slice, including: Calculate the information freshness index, reliability index, latency index, and throughput index of the target network slice based on the resource data; Based on the information freshness index, reliability index, latency index, and throughput index of the target network slice, define the information age evaluation function, reliability evaluation function, end-to-end latency evaluation function, and throughput evaluation function respectively, and construct a resource allocation optimization model with the goal of maximizing the network slice performance evaluation function; The resource allocation optimization model is optimized and solved using deep reinforcement learning algorithms and Markov decision process models, resulting in the following resource allocation schemes for the target network slices: The resource allocation optimization model is optimized and solved using a flexible actor-critic algorithm and a Markov decision process model. A reward function is designed based on the performance evaluation function for policy training, thereby obtaining an end-to-end deterministic slice resource allocation scheme.
2. The deep reinforcement learning based end-to-end deterministic slice orchestration method according to claim 1, wherein, The calculation of the information freshness index, reliability index, latency index, and throughput index of the target network slice based on the resource data includes: Obtain the parameter value of the target parameter to be calculated, and the preset comparison value corresponding to the target parameter; The target parameters include information freshness, reliability, latency, and throughput; The evaluation value of the target parameter is obtained by comparing the parameter value with the preset comparison value; The evaluation values of all target parameters under the target network slice are obtained and the average value is calculated to obtain the target index corresponding to the target parameter. 3.The deep reinforcement learning based end-to-end deterministic slice orchestration method according to claim 2, characterized in that, The calculation of the information freshness index of the target network slice includes: ; ; ; ; in, Represents the information freshness of business m in time slot t+1; a binary variable. This indicates that the power virtual private network device successfully transmitted data in time slot t. If so, then... ,otherwise, ; This represents the end-to-end delay of data transmission within time slot t of a power virtual private network (VPN) device. This indicates the information freshness of service m in time slot t; A fixed constant preset for the system; W represents the average information freshness; W represents the total latency of a network slice. This represents the function for evaluating the freshness of information. A represents the preset timeliness index of business m; A represents the freshness of information. This represents the total number of services in time slot t. Indicators representing the freshness of information.
4. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 2, characterized in that, The reliability metrics for the target network slice are calculated as follows: ; ; ; ; ; in, This indicates the reliability of service m in time slot t; Represents a virtual node; Let N represent the set of nodes for business m, and N represent the number of nodes; a binary variable. This represents the underlying service node used to transmit service m. Map to virtual node If so, then ,otherwise ; This represents the probability that physical node i will function normally in time slot t; This represents the probability that virtual node i will function normally in time slot t; This represents the probability that the virtual node u is working properly; R represents the probability that physical node i is working properly; R represents reliability. This represents the average reliability over time slot t; This represents the reliability metric evaluation function. This represents the preset reliability index of service m; This represents the total number of services in time slot t. Indicates reliability metrics.
5. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 2, characterized in that, The calculation of the latency metrics for the target network slice includes: ; ; ; ; ; in, RAN represents the end-to-end delay of data transmission within time slot t of the power virtual private network (VPN) device; RAN represents the access network. Indicates access network latency. This represents the connection relationship between service m and base station k. Time indicates connection, Indicates the data packet size. This represents the upload data rate of service m at base station k; CN represents the core network. Indicates core network latency. Let N represent the set of nodes for business m, N represent the number of nodes, and n represent the number of nodes. Indicates the size of the transmitted data. The amount of computing resources required, This represents the amount of computing resources allocated to service m on node n. This represents the corresponding set of links, where L represents the number of links and l represents the link itself. Indicates link distance, Represented as a link The amount of bandwidth resources allocated; This represents the delay metric evaluation function. The fixed time delay is represented; This represents the total number of services in time slot t. This indicates the latency indicator.
6. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 2, characterized in that, The throughput metrics for the target network slice include: ; ; Where T represents throughput and t represents time slot. This represents the throughput metric evaluation function. This represents the actual throughput of deterministic services. This is the minimum uplink data transmission rate for the slice, i.e., the minimum throughput required by service m. This represents the throughput metric. This represents the total number of services in time slot t.
7. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 1, characterized in that, The step of constructing a resource allocation optimization model based on the information freshness index, reliability index, latency index, and throughput index of the target network slice includes: ; in, Indicate the performance of the target network slice z; Indicates latency; This represents the throughput metric. Indicators representing the freshness of information Indicates reliability metrics; - These represent the weight values corresponding to the indicators.
8. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 1, characterized in that, The optimization solution of the resource allocation optimization model using deep reinforcement learning algorithms and Markov decision process models includes: The resource allocation optimization model is modeled as a Markov decision process, which includes a reward function. The resource allocation result is obtained by the deep reinforcement learning algorithm based on the current state of the target network slice and historical experience. Determine whether the resource allocation result meets the business requirements; if not, apply a negative penalty in the reward function. The deep reinforcement learning algorithm iterates repeatedly, and in each iteration, the resource allocation result is adjusted according to the reward function until the business requirements are met.
9. The end-to-end deterministic slice orchestration method based on deep reinforcement learning according to claim 7, characterized in that, The deep reinforcement learning algorithm includes a model-free deep reinforcement learning algorithm based on maximum entropy.
10. An end-to-end deterministic slice orchestration apparatus based on deep reinforcement learning, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements each step of the end-to-end deterministic slice orchestration method based on deep reinforcement learning as described in any one of claims 1-9.
Citation Information
Patent Citations
Industrial edge network system architecture and resource scheduling method
CN112235836A
Network slice optimization processing method and system
CN113992524A
Resource allocation method for digital twin auxiliary end-to-end deterministic time delay network slices in industrial Internet of Things
CN119031484A